FOUNDED AND irre 
EDITED BY oe 


“On the Estimation of Piette Cooficieate of pataitaaed Time ‘Seale a 
Stationary Residual, Mograr Rosmnsiarr. be PEaees «deems 4h Fa ah 


“An Application of Information Theory ta Multivarinte a 


‘On the Chatacteristics of the General 
> Walle, J. Kinvet-anp J. Wovrowigs...-. a 


/"Moleranee Regions, D, A 5. Frasher AND ago ruin 
~ Gentralized pecioraaiet Limite, 4, tH B. area Sai a 


A ‘Variable Paty Distribution Puttin, ome E. Weslacak’ pny 196 
} ns esifleatinice, 


Heamas Ribert...9.....045c00 } pe ee ee 
Correction to “On the Maximum Numbee, ; 


psa ie eae ee 
; e Editor of the Annals é rhe 








NEW POLICY FOR ABSTRACTS 


The Institute of Mathematical Statistics has a new policy on 
abstracts. They should be submitted in duplicate to the Editor, pref- 
erably on abstract blanks, which can be obtained from the IMS 
Secretary. Abstracts must be received at least 47 days before the 
first day of the meeting at which they are to be presented, indicating 
whether presented by title or in person. (Only one contributed paper 


may be given in person at any one meeting.) They may be printed 
prior to the publication of the report of the meeting. Those received 
by April 30 will appear in the September Annals, by July 31 in De- 
cember, etc. Abstracts should be limited to 200 words or the equiva- 
lent, and should avoid displayed expressions and complicated for- 
mulae. They can be accepted from non-members of the IMS only if 
transmitted by members. 








LARGE-SAMPLE THEORY: PARAMETRIC CASE! 


By HerRMAN CHERNOFF 


Stanford University 


1. Introduction. Large-sample theory is a branch of statistics which seems to 
have developed because the existence of certain theorems in the theory of prob- 
ability made it relatively easy to obtain good approximate results if the sample 
size is large. These theorems, like the law of large numbers and the central limit 
theorem, are extremely elegant, and frequently their elegance is captured by these 
“easily” obtained results. This elegance has undoubtedly stimulated a great 
many people to do work in statistics. 

However, since one is seldom faced with an infinite sample, it is relevant to 
ask whether asymptotic results are useful, and if so, where. In particular, one is 
often asked whether a given sample size is large enough to justify the use of 
asymptotic results. Frequently this question is embarrassing, and no answer is 
available simply because the answer would involve the solution of the more dif- 
ficult finite-sample-size problem and the use of nonexistent related tables. In 
some cases, where this question has been treated, it has been shown that these 
asymptotic results are very good approximations. One example is the study 
wherein it was shown that the chi-square goodness-of-fit statistic has approxi- 
mately the chi-square distribution for rather small sample sizes [1]. 

Even though results of this type are not available for a particular problem, the 
study of the large-sample case could be justified on other grounds. Asymptotic 
solutions of a problem frequently give insight into what constitutes a reasonable 
procedure for the finite-sample-size case. Everyone who has had the experience 
of seeing how obvious the solution to a certain problem is after spending hours de- 
riving it can appreciate how suggestive an asymptotic result can be for the finite- 
sample-size problem. For somewhat similar reasons, the method of maximum 
likelihood estimation, which has various good large-sample properties, has be- 
come extremely popular, even for small samples. In fact, a glance at the litera- 
ture gives the impression that the property of being a maximum likelihood esti- 
mate has almost been adopted as the criterion of optimality. 

In this paper we deal with the parametric case. Ordinarily this is assumed to 
mean that our observations come from a population whose distribution is speci- 
fied by the value of a parameter @, which may be a k-dimensional vector. A 
specific problem would be that of testing whether two normal populations with 
the same variance have the same mean. It seems that once more we must face 
the fact that. our problems may not reflect reality completely. There is a con- 
siderable class of problems for which the parametric formulation is more than a 


1 Presented as a special invited address at the Annual Meeting of the IMS in Berkeley, 
California, December 27, 1954. This work was prepared with the partial support of the 
Office of Naval Research. 


1 





2 HERMAN CHERNOFF 


convenient and very rough approximation. On the other hand, there is a consider- 
able class for which this is not so. Even in these cases the same sort of reasoning 
which was advanced to advocate the study of large samples results is apropos 
to justify the study of parametric theory, and even of its application to problems 
where the parametric formulation seems quite rough. 

Another point of some interest is that the normal distribution, on occasion, 
plays the role of a worst distribution. In such cases one may obtain quasi- 
maximum likelihood estimates, i.e., estimates derived by the use of maximum 
likelihood on the not necessarily correct assumption that certain random vari- 
ables are normally distributed. These estimates may be inefficient compared 
with the true maximum likelihood estimates. Still, these quasi-maximum likeli- 
hood estimates have the same or as good asymptotic distributions as they would 
have were the assumptions of normality correct. They also have the advantage 
that their computation does not involve the knowledge of the true distribution of 
these variables. Some complex examples are treated in [2]. A trivial example 
which illustrates this point is the following. On the basis of a sample of n inde- 
pendent observations, estimate the mean of the population when it is assumed 
to be normal and it really is rectangular. Here, X is the quasi-maximum likeli- 
hood estimate and the true maximum likelihood estimate is 4, the average of the 
smallest and largest observations. The asymptotic distribution of ~/n(X — yu) 
is normal with mean 0 and variance o” (the variance of the population), whether 
the population is normal or rectangular. However, if it is rectangular, 4 will 
be considerably more efficient. 

This paper will be divided into two main parts. In the first I shall summarize 


several techniques and results which are useful tools in the study of large-sample 
theory and which, I feel, have been unfortunately neglected in the literature. 
In the second part I shall consider some results in inference in the large-sample 
parametric case. There, much of the space will be devoted to material which 
has been of special interest to me. In this way I hope to communicate some of 
my outlook rather than merely to present a long list of accomplishments. 


Part 1 


2. Stochastic limit and order relationships. The title of this section is taken 
from that of a paper of Mann and Wald [3]. Their stated purpose was to provide 
readers with certain general results which would eliminate the necessity on the 
part of future authors of laboriously proving special cases, not to mention con- 
fusing the readers. This aim seems to have been largely frustrated mainly by the 
fact that the paper was practically forgotten. I wish to discuss some of these 
general results and notations and some useful generalizations of these. 

In standard notation one writes a, = O(r,) if {a,} is a sequence of real num- 
bers and {r,} is a sequence of positive numbers such that a, /1r, is bounded. 
If a, /r, 70 asn— ~, one writes a, = o(r,). This notation is frequently con- 
venient and suggestive. For example, if a, — 0 and b, is bounded, it follows that 
a,b, — 0. This may be simply written as follows: o(1) O(1) = o(1). 





LARGE SAMPLE THEORY 3 


An analogous notation may be defined for sequences of chance variables 
{z,}. We may write z, = O,(r,) (x, / r, is bounded in probability) if for each 
¢ > 0 there is an M, and an N, such that 


Pr {\|z,| = M.r,} Se forn = N,. 
Finally, we may write z, = 0,(r,) (\z,| / 7, approaches zero in probability) if 
Pr {\z,| = er,} ~0 asn—> © for each e > 0. 


It might be well to note here that these concepts are easily extendable to the 
case where 2, is not necessarily a real chance variable, but where z, may take on 
values in an arbitrary space on which an “absolute value” is defined. 

One of the results obtained by Mann and Wald is part of their Corollary 1, 
which states essentially that the algebra of o and O extends to 0, and O,. A 
paraphrase of this result, which I have found very useful, is due to John Pratt 
and is stated as follows: Suppose that {z,} is a sequence of chance variables de- 
fined on an arbitrary space. Let {g,(z,)} and {f2(z,)}, 7 = 1, 2,---, k, be 
k + 1 sequences of measurable functions, and let {r,} and {ri} be k + 1 
sequences of positive numbers. 

THEOREM 1. Suppose that 


fn (@n) = O,(rx”), J=1,2,---,h, 
o PG) = of), f= ht, b+2-++,k, 
and that 
(2) for any (nonrandom) sequence {a,,} for which 

fr (ax) = O(r'n”), J=1,2,°+-,h&, 


and 


fan) = of), f= h+1, b+2,---,k, 


hold, it follows that g,(a,) = O(r,). 

Then, it follows that g,(z,) = O,(r,). Furthermore, if the last line of (2) is re- 
placed by gn(@n) = O(r,), the conclusion 18 9,(%,) = 0p(Tn). 

The following are some examples which may serve to illustrate the use of this 
result. 

Examp Le 1. If y, > y, i.e., if y, approaches y in probability, or y, — y = 
o,(1), and if z, >> z, then y,z, > yz. This result follows because we are given, 
on the one hand, that y, — y = o,(1) and z, — z = o,(1). On the other hand, 
it is easy to prove (and is well known) that b, — b = o(1) and c, — c = o(1) 
(i.e., that b, — b and c, — c) imply that b,c, — be = o(1). Consequently, 


Yntn — Y2 = 0O,(1). 


Several remarks may be made about this example. It may seem to involve a 
tremendous amount of machinery for a very simple result. In fact, a direct proof 





4 HERMAN CHERNOFF 


may seem to be no more difficult than the ‘‘on-the-other-hand” part. Actually, 
my own experience in class has shown that the direct proof is usually instructive 
because students find it so difficult. The tremendous machinery is not so tre- 
mendous if this approach is used frequently, for then it becomes standard. 
Finally, this example illustrates how this approach clearly separates the non- 
stochastic asymptotic elements of a problem from the stochastic elements. 

One point which may have not been thoroughly clarified in the above exposi- 
tion is the specification of zx, , fn’, ga, and a, in this example. To be perfectly 
specific, we may let 


tn = (Yn, 20,2); Sa (Zn) = Yn — Y; 
(tn) = Yntn — YR} (tn) = fn — 23 
a, = (b,, c,, 6, c). 


EXxAmpLe 2. If x, = 0,(1), it follows that sin z,/+/z, = o,(1). All that needs 
to be shown is that sina,/+/a, — 0 if a, — 0. 

EXAMPLE 3. The following is the simplest of several results which concern 
Taylor Series Expansions. 

Coro.uary 1. If 


(1) In = a+0,(r,), 


where r, — 0, and 
(2) f(x) has s continuous derivatives at x = a, then 


a) fia 


flan) = f(a) + (an — a)f’(a) + -+- + m8 - + o,(r). 


The following is a considerably more sophisticated example. Here the sepa- 
ration of stochastic and nonstochastic elements is a blessing, for the problem is 
not completely trivial under the best of circumstances. 

EXAMPLE 4. Suppose that 2 , 2, +--+ , 2, are n independent observations on a 
chance variable with density 


f(x | a, B, y) 8 forO Sz a 


’ 
f(x|\a,B,y) =¥7 fora <2 1, 


where a8 + (1 — aly = 1,0<a<1,8 >0,y > O, and B # ¥. It is not 
difficult to show that the maximum likelihood estimate a, of a maximizes 


Feled | | , ) | F,\a 
Qa a 7 


where F,,, is the sample c.d.f.; i.e., F(x) is 1/n times the number of observations 
less than or equal to x. (Note that the function F,, is itself random.) We may 
write &, = (F,). A proof of the consistency of a, (i.e., that a, > ao if ap is 
the true value of the parameter) is partially complicated by the possibility that 





LARGE SAMPLE THEORY 


&, may get close to zero or one. We shall merely outline the proof that 4, is 
bounded away from zero in probability, i.e., 1/a, = O,(1). 
First, it is known that 


(1) sup | F,(x) — Fo(x) | = o,(1), 


0szg1 


where F(x) is the true c.d.f., and it can be shown that 


(2) Fatt) | _ 09,1). 
zx 


Secondly, it can be shown that if {G,} is a sequence of nonrandom c.d.f.’s 
such that 


(1’) sup | Ga(z) — Fo(x) | = o(1), 


0szs1 


sup G.(z) = O(1), 
0<z<l x 
then 1/6(G,) = O(1). It follows that 1/a, = O,(1). 

One may observe that the role of x, in Theorem 1 is played here by the sample 
COlhe Fas 

Another important consideration in the Mann-Wald paper involves a general- 
ization of a well-known result which states that if z, has a limiting distribution, 
then for a continuous function g, g(z,) has the corresponding limiting distribu- 
tion. Hence, if z, is asymptotically normally distributed with mean 0 and vari- 
ance 1, x, has an asymptotic chi-square distribution with one degree of freedom 
This result was generalized to allow for the possibility that g has points of dis- 
continuity. Unfortunately, through an oversight, a slightly weaker result than 
could have been obtained was presented. The stronger version will be stated 
after we introduce some appropriate notation. 

We write L(z,) — L(x) (read: the distribution !aw of xz, converges to the dis- 
tribution law of x) or lim,.. L£(z,) = L(x) if F,(a) — F(a) at every point a 
of continuity of F, where F, and F are the c.df.’s of z, and 2, respectively. 
Here, £(z,) and L(x) represent the probability measures associated with z, 
and zx. Let D(g) be the set of discontinuities of the function g. 

THEOREM 2. If 


(1) L(x,) — L(x) 
and 
(2) L(x; D(g)) = P{x e Dg)} = 
then 
Llg(rn)] > L[g(z)). 


Examp Le 1. If £(z, , y,) ~ £(x, y), where x and y are independently and nor- 
mally distributed with mean 0 and variance 1, then £(z,/y,) — £(x/y), which 
is a Cauchy distribution. 





HERMAN CHERNOFF 


This theorem was extended by Rubin [4] to the case where z, and z take on 
values in a topological space X. Here, the notion of convergence in distribution 
law must be extended. Rubin uses the following definition? 


£, — £ if for every closed set S, £(S) = lim sup £,(S) 


n-@ 
or, equivalently, 


£,—-£ if forevery open set S, £(S) < lim inf £,(S). 
For many spaces, in particular for metric spaces, this definition coincides with 
the following one used by other authors [5], [6]: £, — £& if for every bounded 
continuous function h, 


[ ne) d£Z,(x) > / h(x) d&(z). 


Both of these are extensions of the definition for Euclidean spaces. With Rubin’s 
definition, it follows rather easily that Theorem 2 applies whenever g is a measur- 
able transformation from one topological space into another. 

Rubin [7] has applied this result to find the limiting distribution of quasi- 
maximum likelihood estimates of the parameters of certain sets of simultaneous 
linear stochastic difference equations. Donsker [8] derived a related result while 
engaged in the justification of a heuristic derivation of the asymptotic distribu- 
tion of Kolmogorov-Smirnov statistic given by Doob [9]. It is interesting to 
note that in terms of our Theorem 2, Doob’s paper dealt mainly with finding the 
distribution of g(x), after indicating that it seemed reasonable to expect that in 
some sense £(z,,) — L(x). There the role of x, was played by the sample c.d_f. 
in the Kolmogorov-Smirnov problem. 

The above exposition is far from complete. For example, the following result 
for Euclidean spaces is rather useful. Note that it can be reworded so as to be 
extended to metric spaces. 

THEOREM 3. If £(z,) — L(x), then L(x, + 0,(1)) > L(z). 

Furthermore, it seems to me that there still remains some work to be done 
with a view to making the application of Theorems | and 2 more cut and dried 
Finally, it should be remarked that direct derivations which do not separate the 
stochastic and asymptotic elements of the problem are sometimes simpler and 
neater than the techniques suggested by the above results. 


3. The Cramér extension of the central limit theorem. In 1938, Cramér [10] 
obtained an elegant extension of the central limit theorem which, for some reason, 
seemed to have been overlooked by statisticians. This seems to have been un- 
fortunate, since it appears to be more relevant than the central limit theorem in 
many statistical applications. 

The central limit theorem is loosely described as follows. The average a. 


2 The Borel field associated with the distributions is assumed to be that generated by the 
closed sets. 





LARGE SAMPLE THEORY 7 


of n observations on a chance variable X is approximately normally distributed. 
More precisely, 


l-1e a —t2/2 
Pr? Vn(Xn ; - #) < «i —> : 


dt asn— ©, 
\ o ) —« V2r 


if X, is the average of n independent observations on a chance variable with 
mean yu and variance o. Suppose, now, that a is not fixed but is replaced by a, , 
where a, — — © as n — ©. Then both sides of the above expression would 
approach zero. In this sense, the above equation could be considered to be still 
valid. Even so, it is of importance, as we shall see in Section 6, to determine how 
fast each side approaches zero and whether the two sides are asymptotically 
equivalent, i.e., whether the ratio of the two terms approaches one. 

In fact, Cramér has essentially shown that as long as a, does not approach 
—« too rapidly, the two sides are roughly equivalent. However, this result 
fails to hold when a, is of the order of magnitude of+/n. Note that if a, = —b»/n, 
b > 0, we are essentially interested in Pr{X < c}, where c < yu. This case is an 
especially important one. Here, it is shown that, roughly speaking, Pr{X < c} ~ 
m", where m = inf, Ef{e'*~}. 

A result of Esseen [11] permits us to eliminate one of the conditions which 
Cramér had to apply, and which led him to obtain weaker results for the case 
where the chance variable is discrete. We shall state a version of Cramér’s 
result. 

Tueorem 1. If E(e'*) < @ in some neighborhood of t = 0, and if a, < —1, 
and a, = 0(+/n), then 


( af? \ Gy, —t?/2 3 
\ € ) —2 T n n 


a 
>| 1 + 0 ( . | 
[1 +0(% 
where A(¢) is an analytic function of ¢ whose coefficients depend on the moments 
of X. 


A similar result is obtainable for the case where a, is positive. Note that 
(a,/V/n)d(a,/Vn) 


may become large if a, is larger in magnitude than n’’. However, this term con- 

tributes a relatively unimportant amount compared with the normal approxi- 

mation term which is asymptotically equivalent to (1/2xa,)~’ exp (—a’,/2). 
Tuerorem 2. If E(e'*) < @ for t in some neighborhood of 0 and c < E(X), then 


S 1 : by by-1 1 
P(X, Se) = Gem | + 5 Deoisee +4 +0(4)], 


where bo > 0 and m = inf, E(e‘*—); the quantities b; depend on c, and k is an 
arbitrary positive integer. 


1/6 





HERMAN CHERNOFF 


Cramér’s results were generalized by Feller [12] for the case where the ob- 
servations do not necessarily have the same distribution. 


ParT 


4. Estimation. The development of the large-sample theory of estimation was 
given great impetus with the publication by Fisher [13], [14] of his works on esti- 
mation, where he proposed the method of maximum likelihood and suggested, 
among others, the concepts of consistency, efficiency, and sufficiency. The im- 
portance of the notions Fisher developed was soon recognized and the method 
of maximum likelihood became very popular among statisticians. However, 
these notions and the properties of the method of maximum likelihood were some- 
what more complicated than Fisher or his immediate followers realized. Conse- 
quently, many proofs dealing with the properties of these estimates were found 
to be in error. Considerable light was thrown on these complications when J. L. 
Hodges, Jr., produced an example of superefficiency. This concept was later 
treated by Le Cam [15], who also presented an excellent historical survey of the 
field of maximum likelihood estimation. We shall discuss these notions very 
briefly, referring the reader to Le Cam’s paper for a more detailed discussion. 

Let X be a chance variable whose distribution is determined by the value of a 
parameter @ which is assumed to be in a prescribed set 2. For the purpose of 
large-sample theory, Fisher defines an estimate 7’ as a sequence of functions 
{T, = T1T,(X1, X2,--:, Xn)}, where 7,(Xi , --- , X,) represents the “esti- 
mated” point of 2 when a sample X,, --- , X, of n independent observations on 
X are observed.* 

DEFINITION 1. 7’ is consistent if 7,,(X1, --- , X,) — @in probability asn — «. 

Suppose that the distribution of X is characterized by the density f(z, 6). 
Then, an estimate 7* is a maximum likelihood estimate of 6 if] [7.: f(X;, 8) 
assumes its maximum value at @ = 7°3(X,, X2, --- , X,). (In most applications, 
the class of distributions may be represented by densities with respect to some 
o-finite measure.) It may turn out that the maximum likelihood estimate does 
not exist. For example, there will be no such estimate for the mean yp of a normal 
distribution if it is assumed that u is in the open interval (—1, 1) and that the 
sample mean is greater than one. 

When Fisher introduced the notion of asymptotic efficiency, he did this for 
the case where @ was assumed to be on the real line. Then 7’ was said to be 
asymptotically efficient if its asymptotic distribution (when properly normalized) 
was normal with no larger variance than that obtained for any other consistent 
asymptotically normally distributed statistic. (The variance of the asymptotic 
distribution will be called the asymptotic variance and is, in general, no larger 
than the limit of the variance of the normalized estimate.) Apparently, the re- 
striction to asymptotically normally distributed statistics was felt necessary, 
because Fisher had no way of comparing two dissimilar limiting distributions. 


3 The extension of this notion to the case where the observations need not be independent 
nor identically distributed is rather evident and we shall not formally treat of that case here. 





LARGE SAMPLE THEORY 9 


Fisher and various followers claimed that under suitable mild restrictions the 
maximum likelihood estimates were consistent and efficient. That the attempts 
to establish efficiency with the above definition would encounter grave difficul- 
ties seems clear when an example of superefficiency is given. Le Cam’s example 
is that of observations from a normal population with unknown mean yu and 
variance 1. Let 7’, represent the maximum likelihood estimate which is the mean 
of n observations and let 7, be defined as follows: 


1 
nila? 


, 


T=-T, if |TJ2 


T,=aT if |Tl < i 
where a is an arbitrary constant, Then it is clear that* 
£{-Vn(T. — u)} > NO, 1), 
while 
LiVWn(T. — »)} ~NO,1) ifu #0, 
but 


LiVn(T, — w)} ~NO,o) ifu=0. 
Hence, if 0 < a’ < 1, 7% is asymptotically normally distributed with asymp- 


totic variance which is never larger, and sometimes smaller, than that of T, . 
Let us call the set of @, on which a statistic 7’, is more “efficient” than the maxi- 
mum likelihood estimate T,,, the set of superefficiency. Le Cam has shown 
under certain conditions that a set of superefficiency must have Lebesgue meas- 
ure zero. In this sense the maximum likelihood estimate is efficient. 

In his paper Le Cam makes use of Wald’s decision-theory formulation [16] 
of the estimation problem. (Similar techniques were independently applied by 
Wolfowitz [17].) His definition of efficiency and superefficiency involves the loss 
function L,(t, 6), which is introduced to represent the loss to the statistician 
when he observes a sample of size n and estimates ¢, while @ is the true value of 
the parameter. Le Cam derives and uses the properties of Bayes’ estimates in 
his attack. I wish to indicate an alternative approach which yields somewhat 
weaker results but which will be useful to us later. We may assume that L is 
measured in terms of negative utility [18], so that it makes sense to attempt to 
select T so as to minimize the “risk” or expected loss E{L,(T, , 6)}. Then, cor- 
responding to an estimate 7, we have a sequence of risk functions 


R,(T,, 0) = E{L,(7,(X1, «+> , Xn), 9)}. 


This formulation permits us to compare estimates which are (1) not necessarily 
confined to the real numbers and (2) do not necessarily have similar distribu- 


4 N(0, 1) represents the normal distribution with mean 0 and variance 1. 





10 HERMAN CHERNOFF 


tions. However, a difficulty appears. First of all, it is usually quite difficult to 
evaluate the loss function that the statistician really faces. On the other hand, 
in many cases, it is reasonable to assume that L,(¢, ) is a minimum at ¢ = 6 
and is well behaved near ¢ = 6. Hence, it is often reasonable to assume (in the 
one-dimensional case) that for ¢ close to 6, L,(t, @) is approximately 


Con(@) + Con(O)(t — 8)’, 


where ¢2,(@) > 0. Intuitively, this would seem to furnish a good excuse for 
selecting estimates which minimize the second moment about @. However, some 
misgivings may arise when we note that lim,.. n{E(T, — 6)*} and the variance 
of the limiting distribution of ~/n(7,, — @) need not coincide. In extreme cases, 
it is possible for an estimate to have a very good asymptotic distribution but 
have infinite variance for each sample size. This estimate would not show up 
well if we used E{(7,, — 6)"} as a criterion. In fact, a utility function which 
satisfies the von Neumann-Morgenstern axioms [18] must be bounded. Hence, 
L,(t, 8) should be taken to be bounded, whereas the above approximation, which 
may be reasonable for ¢ close to @, is not. It is difficult to say what is an appro- 
priate criterion without referring to the true L,(t, 6). One might propose the 
asymptotic variance of T, — @ (when suitably normalized), but objections 
could easily be raised against this. 
Suppose that one considered estimates 7’ such that 


T, — 0 = 0,(1/V/n). 
Let us treat the expectation of the normalized loss function 
* ee L,(t, 0) “as =O) 
Li (t, 6) =n E - “Con(O) —_— ’ 
where we assume 
L(t, 0) = n{(t — 0)° + o(t — 6)’, 


and o is assumed to hold uniformly in n as ¢ — 0. Then, 


7 § */rmn 
lim inf ———LLa(s, )} 


— = I; 
oe ( . I? Low ’ 
nE < min ( T, — 9), : | > 


"7 %* 7m , 
lim lim inf yen ELT n, )} 


> 
)\ = 
kon a) > e 7 2 c \ 
nE < min | (7. — 6), |} 


n 


If ~/n(T,, — 6) has a limiting distribution with second moment o°(@), it follows 
that 


( 


° . 9 | ° ; k? \ 2 
lim lim nF | min (T. — 6, |} = a (6) 


k>nam n+w n 





LARGE SAMPLE THEORY 11 


and the asymptotic variance o'(@) may be regarded as a lower bound for the 
normalized risk function. 


On the other hand, if P{|7,, — 6| > k} = o(1/n) for each k, it is possible to 
show that 


es nE\(T, — 9)*} 
lim inf E\L*(T,,0)} = I, 


and then the normalized risk is sandwiched between the real variance (norma- 
lized) and the asymptotic variance. A similar discussion is given by Hodges and 
Lehmann [19]. 

I believe that without unreasonable modifications the standard derivations of 
the asymptotic normal distribution of the maximum likelihood estimates can be 
used to show that for the maximum likelihood estimates lim,.. E{L*(T, , @)} 
is equal to the asymptotic variance. As far as I know, no such proof exists yet in 
the literature. 

The above discussion extends easily to the k-dimensional parameter case 
where the role of the asymptotic variance is played by an expression of the form 
> :.5 4:;(0)0;;(0) Here A = |la;;\| is a nonnegative symmetric matrix whose 
elements correspond to the second-order partial derivatives of the loss function 
at @ (provided that these derivatives or their ratios converge as n — ©), and 
\|o,;(0)|| is the asymptotic covariance matrix. 

A technique that had been used in previous attempts to establish the effi- 
ciency of maximum likelihood estimates was the derivation of a lower bound for 
the variance of an estimate and the proof that this lower bound was “‘asymp- 
totically” attained by the maximum likelihood estimates. 

Results in this direction were apparently first obtained by Fréchet [20] and 
Darmois [21] and later given by Cramér [22] and Rao [23] and called the Cramér- 
Rao inequality. Savage [24] has tentatively suggested alternatively using the 
name “Information inequality’’. These results were extended in various direc- 
tions by Bhattacharya [25], [26], Barankin [27], Wolfowitz [28], Seth [29], 
Chapman and Robbins [30], Kiefer [31], and Fraser and Guttman [32]. Be- 
cause these results invoked regularity conditions on the estimates, the possi- 
bility of superefficiency was hidden. Let us consider the following form of this 
result which does not use regularity conditions on the estimates. (This form and 
a variant of it were communicated to me by Charles Stein and Herman Rubin, 
respectively. ) 

First, we consider the nonasymptotic case where the parameter space © is a 
subset of the real line containing the origin as an inner point. Let us define 


Fisher’s information by 
- 8 log f(X, oy | 


where E, represents expectation with respect to the distribution determined by 
6. We digress slightly to point out that I(@) is additive. That is, if several inde- 





12 HERMAN CHERNOFF 


pendent observations are combined, the corresponding information is the sum 
of the individual informations. In particular, when n independent observations 
are taken on a chance variable X, the information is multiplied by n. 

Now let 7T(X) be an estimate based on the observation X. Under mild condi- 
tions on the distribution of X (and not on 7) we have 

Lemma 1. For every «,0 < ¢€ < 1, and any estimate T, 


sup [Eo{(T(X) — 6)*}J@] 2>1—« 
—a<t<a 
if 
1 f° dé 
2a L41(6) = 
Otherwise, 
22 
sup Ky{(T(X) — 6)*} =“. 
—a<b<a 4 
This result can be applied to the large-sample case. To deal with estimates 
which may behave well asymptotically, but which may have large or even in- 


finite variances, we introduce the truncated estimate 7%, 
T* = T if |T| S 2a; T* = 2a if T > 2a; T* = —2aif T < —2a. 


Since min [(7 — 6)*, 16a] = (T* — 6)* for —a < @ < a, we can easily derive 
the following theorem for the case of n independent observations on X. 
THEOREM 1. 


2 
lim lim inf sup Ex nl@) min | ( — 6), ri} ae 


kw noo —k/4/Rn<bck/4/n 


if I(@) is measurable and bounded away from 0 in some neighborhood of 6 = 0. 
(This statement might be easier to read if it were weakened by replacing, under 
“sup,” the interval —k/4./n < 0 < k/4./n by —8 < @ < 6.) 

This result clearly allows for the possibility of superefficiency. It is weaker 
than Le Cam’s results, since it does not confine superefficiency to a set of measure 
zero. On the other hand, this statement fits in very well with our discussion of 
the normalized risk functions. It states that for an arbitrary estimate the recipro- 
cal of the information is “essentially” asymptotically a lower bound for the asymp- 
totic variance and hence for the normalized risk function. This, together with the 
above-mentioned conjecture that for the maximum likelihood estimate, the 
normalized risk approaches the asymptotic variance (which coincides with the 
reciprocal of the information), would give the ‘‘essential”’ efficiency of the maxi- 
mum likelihood estimate from the normalized risk-function point of view. 

Theorem 1 also has the advantage that it can be easily extended to the case 
where the independent observations are not necessarily from the same popula- 
tion. If the average information per observation is given by 


T.(0) = = (I,(6) + +++ + T)], 





LARGE SAMPLE THEORY 13 


where J,(@) is the information corresponding to the jth observation or experi- 


ment, we can replace J(@) in Theorem 1 by I,(0), provided 7,(6) is measurable 
and 


lim inf T,,(6) > 0 


nook | SRO Kk/ 


for each k. The case where 


lim sup T,(@) ='0 
ne —k/] /RO<k VR 


for each k gives no difficulty. 


5. Optimal designs for estimating parameters. Suppose that there is available 
a class of experiments {#}. A design will consist of a selection of n of these ex- 
periments to be performed independently. Suppose that the outcome of each 
experiment depends only on a real-valued parameter @ which is to be estimated. 
We shall assume that the true value of @ is approximately known so that it 
makes sense to consider locally optimal designs. That is to say, that we shall be 
interested in selecting n experiments so that an estimate of @, based on the out- 
comes, will be very good if @ is close to some specified value 6°. 

If n is large, it seems reasonable to select these n experiments, E,, E2.,---, 
E,,, so as to make the sum of the corresponding informations }>7.. I(E;, 6) 
large. If I(E, 6°) is maximized by an experiment Ep, it pays to repeat the ex- 
periment, Ey, n times. Then, by the Cramér-Rao type of theorem we treated, 
the asymptotic variance for any design is at least as large as 


n - 1 
= . 0 i I(Eo, 0°)’ 
2, I(E:, 6°) 


i 
>. 


which is the asymptotic variance for the maximum likelihood estimate based on n 
repetitions of Z). Furthermore, if the conjecture that for maximum likelihood 
estimates, the asymptotic variance is equal to the normalized risk is correct, 
then the normalized risk is asymptotically a minimum for this design. 

While the above problem is not very deep, there are certain remarks which 
are relevant to the extension of this problem to the multidimensional parameter 
case. First of all, it is quite possible that I(E, 6°) does not attain its maximum. 
A trivial case is the following: Suppose that EZ, corresponds to observing a 
normal deviate with mean @ and variance o’, and suppose that all EZ, are avail- 
able for ¢ > 1. Here, I(E,, 6°) = 1/o’ can be made arbitrarily close to 1 but 
cannot equal 1. It is apparent that the theoretical difficulty posed by this situa- 
tion is neither significant nor important. 

In general, some experiments are more costly than others, and the formulation 
involving the selection of a preassigned number of experiments may reasonably 
be changed to that of selecting an arbitrary number of experiments whose 
total cost is preassigned. Here, we would attempt to make > I(E;, 6°) large, 





14 HERMAN CHERNOFF 


subject to the restriction }~ c(E,;) = k, where c(E) is the cost of the experiment 
E. Rewriting the above as 


Yr 1E:,0) => | Ae f Do (E,), 
(E;) 
it is evident that we should select that Ey which maximizes I(E, 6°)/c(E), 
the information per unit cost, and repeat Ey, k/c(Eo) times. 

Let us now extend the problem to the following case. Suppose that it is desired 
to estimate a parameter 6, , but the distribution of the outcomes of the available 
experiments depends not only on 6, , but also on 62, --- , &. A special case of 
this would be that of estimating the slope 8 of the regression line of Y on 2, 
where Y = a+ 6x + u, L(u) = N(O, 1). Each level x represents an experiment 
E, ; then let us assume that one has available the set of E, for which —1 S x S 1. 
It is well known that in this special example the optimal experiment consists of 
performing F; and E_, each half of the time. 

To formulate this problem properly, we first note that in the case of k param- 
eters, the information is replaced by the information matrix 


| él X, X, 6) \ || = 
I(6) = | B, {2 oe SX, 8) 6) . os £0 ’ j= 1, 2, Tee k. 


The information matrix J(@) has the additive property; i.e., the information 
matrix corresponding to the outcome of several independent experiments EF; 
is equal to the sum of the corresponding information matrices >> I(E;, 8). 
Another property of interest is the following: Consider the randomized experi- 
ment where £; is performed with probability p;. Then, the information matrix 
for the randomized experiment is given by the average >. p,J(E;, 0). 

Let 1;;(@) represent the (i, j) term of J(@) and let I‘(@) be the (i, 7) term of 
I-*(6). As 1/1(@) represented the asymptotic variance in the one-dimensional 
case, so J'(@) represents the asymptotic covariance matrix in the k-dimensional 
case. In particular, J"(@) represents the asymptotic variance of ~/n(6, — 61). 

It now becomes very natural to formulate our problem as being that of select- 
ing n experiments to minimize 


We may equivalently minimize the (1, 1) element of the inverse of the average 
information per observation, i.e., we minimize 


r@) = [1% 16,6 »|. 


Tr imi 


Now the expression on the right-hand side corresponds to the randomized experi- 
ment where each £; is performed with probability 1/n. By taking n large enough, 
we can approximate each randomized experiment arbitrarily closely. Hence, we 
might reformulate our problem as that of selecting that randomized experiment 
for which I(E, 6°)" is minimized. 





LARGE SAMPLE THEORY Lb 


Each information matrix is nonnegative definite symmetric and may be identi- 
fied with the point in k(k + 1)/2-dimensional space whose coordinates are the 
elements on and below the main diagonal of the matrix. The class of matrices 
corresponding to the randomized experiments is the convex set generated by the 
matrices of the pure experiments. Hence, our problem reduces to that of minimiz- 
ing a function on a convex set. 

I” is a continuous function of J on the set of positive definite symmetric 
matrices However, J” is not defined for singular matrices. If the distribution of 
the outcome of an experiment E depended on less than k independent parameters, 
the information matrix would be singular. Nevertheless, in this case, it can be 
shown that it would be meaningful to redefine J” by lim, .o, (J + 4A)", where 
A is an arbitrary positive definite symmetric matrix. We then have [33]. 

Turore 1. If the set R of randomized information matrices I(@°) is closed and 
bounded, the function I"(6°) attains its maximum on R at a matrix which is a con- 
vex combination of r < k of the information matrices corresponding to the nonran- 
domized experiments. 

This theorem states that there is a locally optimal design for large n which 
involves at most k of the original experiments. This resu!t considerably reduces 
the computational problem involved in computing the optimal design. It con- 
stitutes a generalization of a similar result by Elfving [34], which applies to linear 
regression problems with normal deviates. In connection with his result, Elfving 
indicated an elegant geometrical technique of finding the optimal solution. His 
technique applies to our more general problem if all the information matrices 
resemble those of the regression case; i.e., if the typical information matrix for 
each experiment can be expressed as || z,; ||. In fact, this case occurs quite fre- 
quently in applications which are not normal linear regression. 

Finally, this result extends to the case where one is interested in estimating 
s out of the k parameters involved in the experiments. Then the optimal design 
involves no more than k + (k — 1) + --- + (k — s + 1) experiments. This 
last result is of limited computational applicability if k and s are not small 
numbers. 


6. Testing simple hypotheses. The easiest problem in statistical inference is 
that of testing a simple hypothesis against a simple alternative. Suppose that the 
hypothesis H» specifies that n independently distributed observations, X;, X2, 

- , X,, have density fo(z), whereas the alternative H, specifies the density 
fi(z). It is well known that the class of best tests are the likelihood ratio tests 
characterized by critical regions which contain all points where the ratio 


Ti fa(X.) / [] Ps fo.) 


exceeds some constant c and a subset of those points for which the ratio is equal 
to c. It is peculiar that in this example, where the small-sample theory is so well 
understood, the large-sample theory yields resulst of interest. 

First, let us note that the above test can be considered to be one that is based 
on Y, = 1/n>-2.1 ¥;, where Y; = log f:(X,)/fo(X;). But for tests based on 





16 HERMAN CHERNOFF 


averages of observations, Cramér’s results, which were expressed in Section 3, 
are applicable. These results also apply to tests which are not necessarily likeli- 
hood ratio tests. In what follows, we shall assume that Y; is not necessarily of the 
above form, but that the test consists of rejecting He if Y, > a, and that yo = 
E(Y | Ho) < E(Y | Wi) = m. 

The probabilities of the two types of error are given by 


a, = P{Y, > a,| Ho} and B. = P{Y, S a, | Ho}. 


There are several principles which may be invoked for selecting a, . One of these 
is that of minimizing a, + \8, for some \ > 0. This principle would be espe- 
cially meaningful if there were an a priori probability —, 0 < — < 1, attached to 
Hy . Then, if 1;; represents the loss due to accepting H; when H;; is correct, the 
risk would be given by 


R = Eloo(l — an) + thon + (1 — Elms, + (1 — é)lu(l — By) 


(1 — &)(ln — In) | 
= 1 — él lio — . + —— is ie 
Eloo + ( Elu + Elho 00) | + —-G B 
But for reasonable loss functions, ln — J and lo — le are positive. Hence, 
minimizing FR is equivalent to minimizing a, + \8,, where 


"iS a ks 
E(lio — Loo) 


Another situation in which it would be appropriate to use this criterion would 
be one where it is desired to minimize some function F(a, , 8,), where neither 
dF (0, 0)/da nor dF (0, 0)/d8 vanish. Essentially, this boils down to requiring 
that as n — ©, a, and 8, converge to zero at the same rate. 

Let 


A > 0. 


ma) = inf E{e“* | H;}, ° ¢=01, 
t 


p(a) = max [m(a),m(a)], p= inf p(a). 
4oSSSh1 
A consequence of Cramér’s result (see [35]) is 
THEOREM 1. 
lim [inf (8, + dan)]"" = p (independent of d). 
neo Gy 
This theorem permits us to compare the relative efficiency of two tests. For 
the above test, 8, + Aa, behaves roughly like p”. Suppose that a similar test is 
based on the average of another statistic Z. If p* is the corresponding value of p 
for this new test, then 
* 
pap log p 


log p 





LARGE SAMPLE THEORY 17 


is a reasonable measure of the relative efficiency of the test based on Z to the 
test based on Y. The reason for this is that if m: and mz are large sample sizes for 
which the a,; + \8,,; of the two tests are approximately equal, then n/n is 
close to e. In other words, the first test requires en, observations to do as well as 
the second. This measure of efficiency permits us not only to compare various 
tests based on a given experiment, but also permits us to compare tests based on 
different experiments. 

In particular, let us consider the likelihood ratio test for a given experiment. 
We designate the corresponding p by pre , which can be shown to be given by 


pun = inf f LfiCz)]'[fola)I* deta) 


if fi(z) and fo(x) are the densities of X, with respect to the measure v, under 
H, and Hy , respectively. Because of the character of the above-mentioned meas- 


ure of relative efficiency, it is natural to define the information in the experiment 
by 


I = —log PLR: 


Fisher’s measure of information also had the property that if two experiments 
yield informations J,(@) and J,(6), where J,(6) = 2/;(@), then one needs approxi- 
mately 2n observations on the second experiment to get results comparable to 
those obtained with n observations on the first experiment for n large. It is in- 
teresting to note that while Fisher’s measure of information is additive, the 
above is not. In fact, it has the following properties: 

(1) The information derived from n independent observations on a chance 
variable is n times the information from one observation. 

(2) The information derived from observations on several independent chance 
variables is less than or equal to the sum of the corresponding informations. 

It occasionally happens in practice that it is important to obtain 8 very small, 
whereas a relatively lai ge value of a, like .05 or .10, is not disastrous. In such 
cases, it makes sense to consider in our large-sample approach the problem where 
one minimizes 8 subject to fixed a. Let 8% be the value of 8, which corresponds 
to a fixed value of a, say ao, 0 < ap < 1. We have, as another consequence of 
Cramér’s result, 

THEOREM 2. 


lim pe." = p* = m(u0) _—(independent of a), 
where uo = E(Y | Hp). 
In particular, for the likelihood ratio test, it is easy to show that we obtain 
pie, Which is given by 


pre = ™(u) = e° = exp | / fo(zx) log arta) |, 





18 HERMAN CHERNOFF 


This result was first obtained by Charles Stein [36]. Here again, it makes sense 
to define a corresponding measure of information by 


I* = —log pir = — / log EA dv(z). 


It is of interest to note that 7* represents one of the Kullback-Leibler informa- 
tion numbers [37]; also 


kk f(z) 
[** = / fi(a) log 5°. d(x) 

would arise naturally if 8, were kept fixed and a, — 0. The Kullback-Leibler 
numbers do have the additive property. Incidentally, the above characteriza- 
tion of the Kullback-Leibler numbers implies that they do exceed I = —log pre. 

Until now, we have not discussed sequential analysis from a large-sample point 
of view. At a first naive glance, it may seem as though the very nature of sequen- 
tial analysis is such as to rule out large-sample theory. That this is not so be- 
comes clear when one considers that reducing the cost of sampling should in- 
crease the expected sample size. In fact, let us suppose that the cost per observa- 
tion is c. Consider the Bayes procedure corresponding to a fixed a priori prob- 
ability £ that Ho is correct. The risk function is given by 


Ro = lo + allio — lo) + cE(n| Ho), 
Ry = ln + Bln — lu) + cE(n | Mh). 


The Bayes risk, {Ry + (1 — £)R,, is minimized by Wald’s sequential probability 
ratio test [38]. As c > 0, E(n | Ho) and E(n | Hi) — ~, but 


E(Ro — lo) + (1 — &)(Ri — ln) > 0. 


An elementary application of Wald’s inequalities concerning the operating 
characteristic function gives 
THEOREM 3. 


Ro — lo 


a (clog 1/c) — 


LP lim a — tn — a = zo 

aa eso (Clogi/c) I[**° 

Note that these limits do not depend on lio — J nor on lo, — ly . This is due to 
the fact that as c — 0, the main part of the risk is the cost of sampling. 

It is rather striking that the notions of information, which are natural for the 
sequential and nonsequential cases, are not identical. Upon some consideration, 
however, it is not surprising. In the sequential case, after many observations are 
taken, one is almost sure which hypothesis is correct. Then if Ho seems correct, 
the remaining observations may be selected from an experiment for which the 
corresponding Kullback-Leibler information J* is large. In the nonsequential 
case, the experiment to be performed must be decided on before any data are 
taken. It is natural that the corresponding information should differ from J* 
and [**. 





LARGE SAMPLE THEORY 19 


It is of interest to note that as the hypotheses Hy and H; get closer to one an- 
other, the three measures of information behave in the following fashion: 
I* [** 
a ie ee A . 
7. Composite hypotheses. A classical result in the large-sample theory applied 
to tests of composite hypotheses is that of Wilks [39]. It states that® 


£(—2 log \,) > L(xk-+) 


if X,, is the likelihood ratio based on n independent observations for the test that 
a parameter 6 lies on a specified r-dimensional hyperplane of k-dimensional space 
and the hypothesis is true. It is striking that this result does not involve the dis- 
tribution of the data except in that mild regularity conditions on the distribution 
are required. 

Many tests of composite hypotheses are not of this simple form. For example, 
it may be desired to test whether 6 lies in the first quadrant of the plane, or it 
may be desired to test whether @ lies above a hyperplane or even whether @ lies 
inside a sphere. 

For these problems, first suggested to me by Leonid Hurwicz, a natural 
generalization of Wilks’ result is easily obtained. 

Let f(z, 6) represent the density of the data. Suppose that @ lies in k-dimen- 
sional space and let w and 7 be two disjoint subsets of this space. We are in- 
terested in testing H»:6 € w against the alternative H,:6 ¢ r. Let 


P.(X1, X2,-°+-, Xa) = sup >. f(Xi, 0). 


Oew i=l 


The standard definition of the likelihood ratio is given by 


PAX, oe ae Xn) 


ss = SOS 
Pou(X1, 2 2» -+, Xa) 


It is somewhat more convenient to treat a more symmetric form 


io int PIX. Xa, ar adhe 
"PAX, X2, °°, Xa)" 


These are related by 
Me =ASifA,S1; Aw =1ifAS > 1. 


We call a set C positively homogeneous if X ¢ C implies aX « C for alla > 0. We 
say that w is approximated by a positively homogeneous set C, if 


inf |x — y| = o(|y|) for y €., 
zeCy 

and 
inf jz — y| = 0(|z}) for xeC,. 
yew 

Let I(@) represent Fisher’s information matrix. 


8x2, represents chi-square with k — r degrees of freedom. 





20 HERMAN CHERNOFF 


Under certain regularity conditions on f(z, @), we have the following result 
[40]: 

THEOREM 1. If w and r are approximated by two disjoint positively homogeneous 
sets C., and C, and the true value of @ is at the origin, then the distribution of —2 
log X*. is the same as it would be for the case where £(X;:) = N(6, 1(0)") and w 
and + are replaced by C.. and C, . 

The advantage of this result lies in the fact that the case of normally distrib- 
uted data is relatively simple to treat. 

It is now easy to show that if w is a smooth r-dimensional surface and r is 
the rest of the k dimensional space and @ ¢ w, then 


lim £(—2 log Ax) = lim £(—2 log A%) = L(xé_,). 


It is also easy to show that if w is the set on one side of a smooth (k — 1)- 
dimensional surface, 7 is the rest of k-dimensional space, and @ is on the boundary, 


lim £(—2 log \*) = L(uxi); iim £(—2 log A.) = L(vxi), 

then where wu is independent of xj and takes on the values 1 and —1 with prob- 
ability 4 and v = 4(u + 1). 

In particular, this case applies to testing whether @ lies inside or outside a 
sphere, and to testing whether @ lies above or below a hyperplane. 

In the problem where one is interested in whether @ is in the first quadrant or 
not, the following is the situation. If @ is on the positive part of either axis, 

lim £(—2 log \%) = £(uxi). 

If 6 is at the origin, the limiting distribution depends on J(0) and is not difficult 
to evaluate mumerically. 


8. Summarizing remarks. The topic of this paper is so broad and current re- 
search in it is so vigorous that it is impossible for me to do more than men- 
tion a few of those notions in it that have been of special interest to me. I have 
tried to give some feeling for those aspects which attract me to the subject and, 
in so doing, I have neglected a considerable amount of important work done by 
many people including among others Neyman and Wald. 


REFERENCES 


[1] W. G. Cocuran, ‘‘The x? distribution for the binomial and Poisson series, with small 
expectations,’’ Ann. Eugenics, Vol. 7 (1936), pp. 207-217. 

[2] H. CoerNnorr anv H. Rustin, “Asymptotic properties of limited-information estimates 
under generalized conditions,’’ Studies in Econometric Method, William C. Hood 
and L. C. Koopmans, (eds.) Cowles Commission Monograph 14, John Wiley and 
Sons, New Yerk, 1953, pp. 200-212. 

[3] H. B. Mann anv A. Wa xp, “On stochastic limit and order relationships,’”’? Ann. Math. 
Stat., Vol. 14 (1943), pp. 217-226. 

[4] H. Rusin, ‘‘Convergence of probability measures on completely regular spaces,”’ 
unpublished. 





LARGE SAMPLE THEORY 21 


[5] B. V. GnepenxKo anp A. N. Koimocororr, Limit distributions for sums of independent 
random variables, trans. by K. L. Chung, Addison-Wesley, Cambridge, Mass., 
1954, 264 pp. 
[6] M. Frecuert, ‘“‘Les éléments aléatoires de nature quelconque dans un espace distancié,”’ 
Ann. Inst. H. Poincaré, Vol. 10 (1948), pp. 215-230. 
H. Rustin, ‘Systems of linear stochastic equations,’’ unpublished Ph.D. dissertation, 
University of Chicago, 1948, 50 pp. 
[8] M. D. Donsxker, “Justification and extension of Doob’s heuristic approach to the 
Kolmogorov-Smirnov theorems,’”’ Ann. Math. Stat., Vol. 23 (1952), pp. 277-281. 
[9] J. L. Doos, ‘‘Heuristic approach to the Kolmogoroff-Smirnov theorems,’”’ Ann. Math. 
Stat., Vol. 20, (1949), pp. 393-403. 
[10] H. Cramér, “Sus un nouveau théortme—limite de la théorie des probabilitds,’’ Ac- 
tualités Sci. Ind., No. 736, Paris (1938). 
{11] C. G. Esszen, ‘Fourier analysis of distribution functions,’’ Acta Math., Vol. 77 (1945), 
pp. 1-125. 
[12] W. Fevuer, ‘‘Generalization of a probability limit theorem of Cramér,” Trans. Amer. 
Math. Soc., Voi. 54 (1943), pp. 361-372. 
[13] R. A. Fisuer, “On the mathematical foundations of theoretical statistics,’’ Philos. 
Trans. Roy. Soc. London, Series A, Vol. 222 (1922), pp. 390-368. 
[14] R. A. Fisuer, ‘Theory of statistical estimation,’’ Proc. Cambridge Philos. Soc., Vol. 
22, Part 5 (1925), pp. 700-725. 
[15] L. Le Cam, ‘‘On some asymptotic properties of maximum likelihood estimates and re- 
lated Bayes’ estimates,’’ Univ. California Publ. Stat., Vol. 1 (1953), pp. 277-330. 
[16] A. Waxp, Statistical decision functions, John Wiley and Sons, New York, 1950, 179 pp. 
[17] J. Wolfowitz, ‘“The method of maximum likelihood and the Wald theory of decision 
functions,’’ Proc. Roy. Dutch Acad. Sci., Vol. 56 (1953), pp. 114-119. 
[18] J. von NEUMANN AND O. MorGENSTERN, Theory of Games and Economic Behavior, 2d 
ed., Princeton University Press, Princeton, N. J,. 1947, 641 pp. 


[19] J. L. Hopaes anp E. L. Lenmann, “‘The Robbins mono-process in the bounded case,’’ 
to be published in the Third Berkeley Symposium Proceedings, University of Cali- 
fornia Press. 

[20] M. Fr&cuer, “Sur l’extension de certains évaluations statistiques au cas de petits 
échantillons”’, Revue de L’Institut International de Statistique, 11 (1943), pp. 183- 
205. 


[21] G. Darmors, ‘‘Sur les limites de la dispersion de certains estimations’’, Revue de l’Insti- 
tut International de Statistique, 13 (1945), pp. 9-15. 

[22] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
N. J., 1946, 575 pp. 

{23} C. R. Rao, “Information and accuracy obtainable in one estimation of a statistical 
parameter,”’ Bull. Calcutta Math. Soc., Vol. 37 (1945), pp. 81-91. 

[24] L. J. Savaae, The Foundations of Statistics, John Wiley and Sons, Inc., New York 
(1954), 294 pp. 

[25] A. Buatracuarya, “‘On some analogues of the amount of information and their uses 
in statistical estimation,’’ Sankhyd, Vol. 8 (1946), pp. 1-14. 

[26] A. Buatracuarya, “On some analogues of the amount of information and their uses in 
statistical estimation,’’ Sankhyd, Vol. 8 (1947), pp. 201-218. 

[27] E. W. Barnak1n, “Locally best unbiased estimates,’’ Ann. Math. Stat., Vol. 20 (1949), 
pp. 477-501. 

[28] J. Wo.row1tz, ‘‘Efficiency of sequential estimates,’”? Ann. Math. Stat., Vol. 18 (1947), 
pp. 215-230. 

[29] G. R. Sern, ‘‘On the variance of estimates,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 1-27. 

[30] D. G. CuapmMan anv H. Rosstns, ‘‘Minimum variance estimation without regularity 
assumptions,’”’ Ann. Math. Stat., Vol. 22 (1951), pp. 581-586. 





22 HERMAN CHERNOFF 


(31) J. Krerer, ‘On minimum variance estimators,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 
627-628. 

[82] D. A. S. Fraser anv I. Guttman, “‘Bhattacharya bounds without regularity condi- 
tions,” Ann. Math. Stat., Vol. 23 (1952), pp. 629-631. 

[33] H. Cuernorr, ‘‘Locally optimal designs for estimating parameters,’’ Ann. Math. Stat., 
Vol. 24 (1953), pp. 586-602. 

[34] G. Ervine, ‘Optimum allocation in linear regression theory,’’ Ann. Math. Siat., Vol. 
23 (1952), pp. 255-262. 

[35] H. Cuernorr, ‘‘A measure of asymptotic efficiency for tests of a hypothesis based on 
the sum of observations,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 493-507. 

[36] C. Stern, ‘Information and comparison of experiments, unpublished. 

[37] S. KuLuBack anp R. A. Lersier, “On information and sufficiency,’’ Ann. Math. Stat., 
Vol. 22 (1951), pp. 79-86. 

[38] A. Wap, Sequential Analysis, John Wiley and Sons, New York, 1947, pp. 60-62. 

[39] S.S. Wixks, ‘‘The large sample distribution of the likelihood ratio for testing composite 
hypotheses,’’ Ann. Math. Stat., Vol. 9 (1938), pp. 60-62. 

[40] H. Cuernorr, ‘‘On the distribution of the likelihood ratio,’’ Ann. Math. Stat., Vol. 25 
(1954), pp. 573-578. 





ae 


A “MIXED MODEL” FOR THE ANALYSIS OF VARIANCE! 


By Henry ScHEFFE 
University of California, Berkeley 


1. Summary. A “mixed model” is proposed in which the problem of the 
appropriate assumptions to make about the joint distribution of the random 
main effects and interactions is solved by letting this joint distribution follow 
from more basic and “natural” assumptions about the cell means. The expec- 
tations of the mean squares ordinarily calculated turn out, with suitable defi- 
nition of the variance components, to have the same values as those usually 
found in more restrictive models, and some of the customary tests and con- 
fidence intervals are justified, but some aspects appear to be novel. For example, 
the over-all test found for the fixed main effects and the associated multiple- 
comparison method require Hotelling’s T?. 


2. Introduction. We consider K replications of a two-way layout with J rows 
and J columns (J > 1, J > 1, K 2 1), the rows corresponding to levels of a 
““Model I’’ [4] factor A, whose effects we wish to regard as fixed effects, and the 
columns corresponding to the levels of a ‘“Model II” factor B, whose effects we 
wish to regard as random effects. We let y;;, denote the kth measurement in 
the 7, j cell (the intersection of the ith row and jth column). Throughout this 
paper, 7 and 7’, as subscripts or indices of summation, will range over the integers 


from 1 to J;7, 7’, and 7” will range from 1 to J, etc., unless otherwise indicated. 

As an illustration, we may imagine an experiment involving J different makes 
of machines and J workers regarded as a sample from a large population of 
workers. Each worker is put on each machine for K days during the experiment 
and 4; is a measurement of the output of the jth worker the kth day he is on the 
ith machine. It is customary in the analysis of variance to write 


(1) Yin = w+ ay t+ 0; + C55 + Cin, 


where the general mean yu and the row effects {a;} are constants, about which we 
may assume without loss of generality that >>; @; = 0, and where the column 
effects {b;}, interactions {c,;;}, and “errors” {e;;,} are random variables about 
whose joint distribution certain assumptions are made. The usual assumption 
that the set {e;;,} is statistically independent of the set {b; , c;;} seems acceptable 
to the writer in many applications, but the further assumptions usually made on 
the {b;} and {c,;} seem to him unsatisfactory,’ as being ad hoc, or too restrictive, 
or not sufficiently complete. For example, the usual assumption that the {c,;} 
are statistically independent of the {b;} is ad hoc, the frequent assumption that 


Received February 11, 1955. 
1 This paper was prepared with the partial support of the Office of Naval Research. It 


may be reproduced in whole or in part for any purpose of the United States government. 
2 An example is [12]. 


23 





24 HENRY SCHEFFE 


all the {c;;} are independent is too restrictive, and the additional assumptions 
stated by those who take >_; ¢; ; = 0 (all 7) are sometimes insufficient even to de- 
termine the expected values of the mean squares usually employed. 


3. The model. We propose to avoid the unpleasant assumptions as follows: 
We will assume that 


(2) Yisk = Miz + Cie, 


where the set of errors {e;%} is statistically independent of the set {m,;} of 
‘‘true’”’ cell means. 

About the set of errors {e;,}, we assume that they are independently and 
identically distributed with zero means and variance o3. (This assumption can 
obviously be lightened without affecting the validity of the expected mean 
squares and unbiased estimates derived in Section 5, which depends only on the 
first and second moments of the set {m;; , ej}, and for which it is sufficient that 
the set {e;;} have zero means, zero correlations, and a common variance a.) 

The writer hopes that the assumptions about the joint distribution of the set 
{m.;}, to be stated below, will be found acceptable. The main effects {b;} and 
interactions {c,;} will then be defined in terms of the {m,;} in a natural way, and 
the joint distribution of the set {b; , c,;} will thus be determined. Some parts of 
this program and its implications have also been developed earlier by others, as 
we shall be able to indicate more conveniently at the end of this paper. 

Our basic assumption on the rectangular array of the {m,;} is that the J 
columns are distributed independently like a vector random variable m with 
I components, m,, --- , m,;. Thus, in the above illustration of machines and 
workers, for each worker in the population there is a vector whose J components 
are his “true means” on the J machines, and the random vector m has the dis- 
tribution generated by this population of workers, idealized as an infinite popu- 
lation. The J columns of the array {m,;} are the J vectors belonging to the J 
workers in the experiment, assumed to be a random sample from the population 
of workers. 

About the components {m,} of the random vector m, we shall always assume 
that the J variances are finite. Sometimes we shall also make the normality 
assumption 

(9): The {m,} have a joint normal distribution, and the {e,;;} are also jointly 
normal. 

We shall also have occasion to refer to a symmetry assumption 

(S$): The {m,} have equal variances and equal covariances. 

We will refer to the assumption (S$) as a limiting case in which certain relations 
become simpler or clearer, but we do not recommend it in applications—where 
there is usually no real symmetry corresponding to this assumption. Thus, in our 
illustration, two machines might be very similar (perhaps of the same make and 
model), but very different from the other machines. Further objections to 
assuming ($) will arise when we consider below the finite analogue of the in- 
finite population of vectors associated with the random vector m. 





MIXED MODEL 25 


4. Definition of effects and variance components. The “‘true’’ mean for the 
ith level of the factor A (the reader may find it easier to substitute “ ‘true’ mean 
for the ith machine’’) is defined to be 


(3) ui = E(mi), 
and the “true” general mean, to be 


(4) w= H. 


, 


where the dot notation here and elsewhere signifies that the arithmetic average 
has been taken over the subscript which the dot replaces, that is, u. = Zoe ui /I. 
The main effect of the ith level of A, or the ith row effect, is defined to be 


(5) ay 0." Ms 


so that >>; a; = 0. The “true” meae for the jth level of factor B (for the jth 
worker in the experiment) is defined to be m.; , and the main effect of the jth 
level of B, or the jth column effect, to be 


(6) b; = 1.5 pf. 

Finally, the interaction effect, c;; , of the jth level of B with the ith level of A 
is defined to satisfy the equation 

(7) ms = wt ay t+ 05 + Cy, 


or 


(8) Cij Mij — M.5 — BG, 


so that >>; ci; = 0 (all j). 

We see that the J vectors with J + 1 components b;, ¢1;, --+ , ¢z; 
dependently distributed like the random vector with components }, ¢, -- 
defined in terms of the basic vector m as follows: 


(9) 
(10) 


We note that b and the {c;} have zero means, and that their variances and co- 
variances depend on the elements of the covariance matrix (¢;,;) of the vector 
m in the following way: 


(11) var (b) = o.. 


’ 

(12) CoV (c;, cy) = Oy — Oy — CF. a C..,5 

(13) cov (b, c;) = oj. — a... 

The main effects {b;} and interactions {c,;;} in (1) thus have zero means, and the 
variances and covariances within a set {b;, ¢1;, +--+, ¢rj;} are given by (11), 


(12), (13), while the covariance of any member of this set with any member of 
the set {b,y, iy, +++ , Cry} is zero for7 ¥ 7’. 





HENRY SCHEFFE 


We shall be led to the appropriate definitions of the “variance components” 
o41,05,04n, by way of the analogy with the “finite model”’, where the vector m 
can take on only one of a finite number, Q, of values, the gth having components, 
SAY, Hig, *** » Mig For the corresponding J X Q rectangular array, the usual 
definitions, chosen to give the simplest formulas for the expectations of the mean 
squares customarily computed, are 


(14) on = (I — 1) D>) (ue — wv..)’, 


9 


(15) os = (Q-—1)°D , — uz.) 


(16) ous = (I — 1)* (Q— 17D) D (ig — we — beg + m..)’. 


If we regard our previous infinite model as a limiting case of the finite model as 
Q — ~, we see that the analogues of the above formulas are to be found by 
replacing pig, ui- » Meg, He, (Q — 1) Do, by m, wi, m. , u, E, respectively, and 
we are led to the following definitions, which we shall adopt for the infinite 
model : 


(17) o,=(I- ag ai, 
(18) op = var (b), 
(19) ons = (I — 1)*>. var (¢). 


= 9 2 . 
The variance components og and o4s may be expressed in terms of the elements 
of the covariance matrix (o;,;) of the vector m, from (11) and (12), 


(20) =<¢., 
(21) ous = (I — 1h (c% — @..). 


We note that «; = Oif and only if b = 0 (we omit the phrase “with probability 
one” here and elsewhere where it obviously applies), that is, if and only if the 
basic vector m has a degenerate distribution satisfying 2 ;m,; = constant = Jy. 
Also, ous = 0 if and only if var (c;) = 0 for all 7, or c; = 0 for all 7, or m; = 
m. + a, ; that is, except for additive constants {a,;}, the random variables m,; 
are identical (not just identically distributed). Some further insight into our 
definitions of the random main and interaction effects and their variance com- 
ponents may be obtained by considering the symmetric case ($) where oi = 
po’ if i ¥ 7’, 0%; = o°. Then, from (20) and (21), 


(22) op = o (1 + pI — 1))/J, 
(23) ous = o (1 — p), 


where —(J — 1)” S p S 1. These relations are shown graphically in Fig. 1. 
The previously mentioned objection to assuming ($) in the infinite model is 





MIXED MODEL 


Variance components o% and oJ, in symmetric case (8) 


Fia. 1 


that its analogue in the finite model is the fulfillment of the following 
$I(I + 1)—2 conditions: If 9, denotes 


(24) x (wig _ bi-) (pire a My.), 


then all g;; are equal, and all g;, with i ~ 7’ are equal. There would seem to be 
nothing in most applications to justify this. 


5. Expected mean squares and point estimates. We shall consider the custom- 
ary sums of squares—namely, those for rows, columns, interaction, and error, 
which we shall denote by (SS). , (SS)z, (SS)as, (SS). , respectively—and the 


corresponding mean squares, 

(25) (MS), = (I — 1)“(SS),, 

(26) (MS)s = (J — 1) "(S8)z, 

(27) (MS)ae = (I — 1)"(J — 1)"(SS)az, 
(28) (MS). = I’*J~“(K — 1)"(SS)., 
where 


(29) (SS). JK (ys. — y..)', 

(30) (SS)x = IK) y+ — y)' 

(31) (SS)as = Ky » (Yis. — Yor — Yoge + Ye), 
(32) (SS). = DD Wim — ys)’ 


In addition, we shall need the contribution to (SS),4s from the ith row, 
(33) (SS)as.i = K> (Yi. — Yar — Yoge H+ Yo») 
I 





28 HENRY SCHEFFE 


and its mean square 
(34) (MS)as,5 = (J — 1)"(SS)as.:- 


In deriving the expected mean squares we will utilize the following three 
formulas for a set of independently and identically distributed random vari- 
ables, 21, -++, tw, With variance o:: 


(35) var (z.) = N~‘o:, 
(36) var (t, — xz.) = (1 — N")o?, 
(37) > E(t, — 2.) = (N — 1)o2. 


It is convenient to define now 
(38) Qi = Yur — Yun. 
We have from (1), 
(39) a= art cy. +e. —e.., 
since c.; = 0 and hence c.. = 0. It follows that 
(40) E(a;) = a; 
and 
(41) var (&;) = var (c;.) + var (e;.. — @...) 
= J var (c;) + (1 — I’) var (e;..), 
from (35) and (36). Again from (35), 
(42) var (&;) = J~ [var (c;) + K™ (1 — I™)o?}. 
Writing 
(43) (SS), = JK ai, 
we may substitute (42) in 
(44) E(SS), = JK [var (&;) + ai] 


to get 
(45) E(SS), = KD var (cc) + U-boat JKQ ai. 
Using the definitions (17) and (19), we then find that 
(46) E(MS), = JKo. + Kors + on. 
After substituting (1) into (30) we have 
(47) (SS)p = IK 2 (by — bh + ep — eu), 





MIXED MODEL 


and so from (37), 
(48) E(SS)s = IK(J — 1) var (b; + e.;.) 
IK(J — 1)(¢5 + IK~e:); 
hence 
(49) E(MS), = IKos + a. 
Substitution of (1) in (33) gives 
(50) (SS)ss = Ky (Cig — Cee + ip — Cp. — Cpe + Cone)’; 


whence 


(51) E(SS)ss. = Ky E(ces — ¢.)° + Ky E(Cije — Cie — Cnje Cres) 


Call the last term a; . It isclear that the value of a; does not depend on i, and it is 
known from Model I theory that >>; a, = (I — 1) — 1)o2. Thus, 
a; = (1 — I’)(J — 1)o%. By (87), the first term on the right of (51) may be 
written K(J — 1) var (c,;). Hence, 


(52) E(SS)as.s = (J — 1)[K var (c;) + (1 — I™)oa), 
(53) E(MS)as,; = K var (c;) + (1 — I™)o?. 
Summing (52) over 7 and dividing by (J — 1)(J — 1), we get 
(54) E(MS)sz = Kois +o. 

Finally, if we rewrite (32) as 

(SS) = DEL lem — en), 

we see that for K > 1 
(56) E(MS), = o%. 


“ 


We shall use the noun “estimate” always to mean “unbiased estimate.” The 
above formulas for the expected mean squares lead to the following estimates of 


2 2 . . r 
Ob , Tan 5 Ge, respectively, if K > 1: 


(57) 3 = (IK)“{(MS)s — (MS).l, 
(58) K”"((MS)as — (MS), 
(59) : = (MS). 


An estimate of a; is the @; defined by (38); an estimate of its variance (42) is 
J“K"(MS)as,;. An estimate of n; = » + a; is y;.. ; an estimate of its variance 
is J*#,,, where #; is defined by (62) below. An estimate of a; — ay is 
Yi. — Yu. ; an estimate of its variance is 





30 HENRY SCHEFFE 
(60) Siw = JJ = y* 2, (yay. = ea = Bee + Yi.) 
3 


In order to estimate the covariance matrix (¢;,) of the basic vector m, we note 
that the J columns of cell means {y;;.} are distributed independently like a 
random vector u = m + », where v is statistically independent of m and has the 
distribution of the vector with components ¢é1;., +--+ , er;. (which distribution 
does not depend on j). It follows that the covariance matrix of u is (#;,), where 
(61) Tie = ow + bieK 0% ’ 
and 6; = lift = 7’, 0 if 7 ¥ 7’. An estimate of 7; is the sample covariance of 
the ith row of cell means {y;;.} with the 7’-th row, 


(62) Fis _ (J = 17> (yas. — Yi--)(Ysrs- y ** Yir--)s 
' 2 
and hence if K > 1, an estimate of o,, is 
(63) ix _ Riv — 8K 6? . 
We remark that if we estimate o% and o4s by substituting the estimates (63) in 
(20) and (21), we get the same estimates as before in (57) and (58). 


6. Distribution theory under the normality assumption. Under the normality 
assumption (9t) of Section 3, the four sums of squares (SS), , (SS)s, (SS)az, 
(SS), are pairwise independent, except for the pair (SS)z, (SS)as. We shall 


prove this for the pair (SS), , (SS).s ; the independence of the other pairs may 
be verified similarly. 


Let us write 


(64) (SS), = JKD Li, 


(65) (SS)4e = KD) Do Li;, 
where 
(66) ke @ he Be, 


t 


ig = Avs t+ Bu, 


(67) L 

(68) Avy =ay tey., 
(69) B; ev.. — €. 
(70) A 


aj — Ci. y 
(71) By; = Oy @ By. ~“ Cie + @.... 


Then it suffices because of the joint normality of the set {Ly , L;;} to prove 
cov (Ly , Li;) = 0 for all 7’, 7, 7. Now, any B just defined is independent of any 
A because of our assumption that the set {e;;,} is independent of the set {m,;}. 
Furthermore, B, and B;; are orthogonal by the familiar Model I theory. Hence, 
it remains only to show cov (Ay , Ais) = 0: 





MIXED MODEL 31 


cov (Ay, Aaj) = Elev.(egy — c.)) = ELI DS cep(e3 — ITD c-)] 
e . 


= ITD Elev yess) — I7 DY DY Elcovesy) 
I + gill soins 


= J E(cve;) — J*E(cve;) = 0, 
since E(e,;¢y ) = 6; E(eey). 

The above proof shows also that @; is statistically independent of (SS)as.;, 
since &; = L,; and (SS)as,; = K 0; Li;. 

From (55) it follows that (SS), is distributed as 0; times x’ with JJ(K — 1) 
d.f. To see that (SS)x is distributed as E(MS), times x’ with J — 1 df., write 
f; = bj + e.;. in (47), so that (SS), = IK>; (f; — f.)’, where the set {f;} 
are independently N(0, c;) (normal with mean 0 and variance o7) with of = 
on + I-'K~‘o2, and hence (SS), is [Ko} times x’ with J — 1 df. Similarly, 
putting gj = ci; + e:;. — e.;. in (50), we find (SS)as.; is E(MS)as,; times x’ 
with J — 1 d.f. It may be shown that for J > 2, (SS), and (SS)az are not, in 
general, distributed as a constant times noncentral (which includes central) x’. 
However, under the hypothesis H4» that cris = 0, all cj; = 0, so (SS)az be- 
comes simply 


(73) K> Dd (esj. — ee. — ey. + ...), 


which is known from Model I theory to be distributed as o: times x’ 
with (J — 1)(J — 1) df. 

The obvious consequence of our assumptions, that the J columns of cell 
means {y;;.} are independently distributed like an J-variate normal vector with 
means uw, °°, # and covariance matrix (7;;) given by (61), we shall utilize 
in the next section. 


7. Tests and confidence intervals. Suppose first that K > 1. Then the x?- 
distribution of (SS),/o2 affords confidence intervals for o2 in the usual way. 

Since the quotient of (MS)s/(IK os + 02) by (MS),/o2 has the F-distri- 
bution with J — 1 and IJ(K — 1) d., confidence intervals for 03/0; are avail- 
able as well as tests of the hypothesis that os = 0, or, more generally, that 
oz/o. < c, a given constant. The test at the a level of significance consists of 
rejecting the hypothesis if and only if (MS)s/(MS), = (I[Ke + 1)F., where 
F., is the upper a point of the F-distribution. The power of the test can be ex- 
pressed in terms of the (central) F-distribution. 

The hypothesis His: o1s = O may be tested with the statistic (MS).s/ 
(MS), , which, under H,,, has the F-distribution with (J — 1)(J — 1) and 
IJ(K — 1) d. Since this statistic is distributed as the quotient of a linear com- 
bination (with coefficients in general unequal) of independent x’ variables by 
another independent x’ variable, the power of the test is not expressible in terms 
of the noncentral F-distribution, but it could be approximated by use of a central 
F-distribution by using methods of Box [2]. 

We now drop the restriction K > 1. Even through (MS), and (MS) as are 





32 HENRY SCHEFFE 


statistically independent and under the hypothesis H,: all a; = 0 have the 
same expected values, their quotient does not, in general, have the F-distri- 
bution under H,. A test of H, based on Hotelling’s T” statistic is given in the 
next paragraph. However, confidence intervals for a particular a; , a particular 
ui, Or a particular difference a; — a» (none of these selected according to the 
outcome of the experiment) can be based on the ‘distribution with J — 1 
d.f. of the respective quotients 


(74) Fs, —- ad (MOS, 
(75) J (y:.. — ws)/PE , 
(76) [(a; i, a; ) nn, (a; = a)|/ Si , 


where the denominators are defined by (34), (62), and (60). 

We assume now that J = J. To calculate Hotelling’s 7° statistic for H,, 
and, in case we find it significant, to make multiple comparisons, we construct 
a rectangular table with R = J — 1 rows and J columns, the entry in the rth 
row and jth column being 


(77) dey = Yrj. — Yrs: 


and we compute the R means {d,.} and the 3R(R + 1) sums of products (which, 
divided by J(J — 1), are estimates of the covariances of the {d,.}) 


(78) Or = Do (ds — dy.)(dv 3 — dy.) = Do dride; — Jdydy.. 
J I 


The 7” statistic is (except for a constant factor) 
(79) F=J(J —I+1)(I -—1)°Q, 


where Q is the quadratic form 
(80) Q= > doa d,.dy., 


and (a” ) is the matrix inverse to (a,,-). It is not necessary actually to compute 
the inverse matrix, since Q may be written in a form given by Rao [10] in terms 
of two determinants of order R calculated from (a,,-), 


| , , 
(81) Q _ |Orr + d,.d, | ae 1. 


la, 

The statistic F in (79) has under H, the F-distribution with J — land J —IJ+ 1 
d.f., so that if F, denotes the upper @ point of the F-distribution with these 
numbers of d.f., H, is rejected at the a level of significance if and only if F > F. . 

The above form of the 7” test appears to lack symmetry, since the Ith row 
plays a distinguished role. It is easy to see that if instead of the {d,.}, any basis 
is used for the (J — 1)-dimensional space spanned by the differences {y;.. — yi..}, 
the same test would be obtained. A symmetric form of Q (and of the noncen- 
trality parameter & below) was given by Hsu [7], but this form would involve 
more numerical calculation. 





MIXED MODEL 33 


The power of the 7” test of H, may be expressed [7] in terms of the noncentral 
F-distribution: The statistic (79) is distributed as noncentral F with J — 1 


and J — I + 1 df. and noncentrality parameter 6°’, whose value will be given 
below, that is, as 


I—1 J —l 
(82) = 1)*y — 1+ v| » sy + rad 2) 
v==2 ven] 


where the {z,} are independently N(0, 1). The noncentrality parameter & 
the value 


(83) F= > > a” 8s. 

where (a” ) is the matrix inverse to that with elements 

(84) aw = cov (d,., dy.) = Ite — ter — tet + Tn); 
and 

(85) 6, = Gy — GQ; = Mr — B- 


In his paper in 1931 on the 7” test, Hotelling [6] gave an associated confidence 
ellipsoid. Recently, the writer [11] published a method of multiple comparison 
derived from the confidence ellipsoid associated with the F-test for equality of 
means in Model I. The same method, when based on Hotelling’s confidence 


ellipsoid, tells us the following: Let @ be any contrast among the {a,;} or {y;}, 


= de hea, = > hu; , where {h,;} is any set of known constants satisfying 
>: hi = 0. Let 6 be the estimate 6 = >>; hy:.. = >°, hd,. , so that its variance 
of = >> dor hph,ra,y is estimated by 


(86) = ITT — 1D DL Idee 


Then, for the totality of contrasts {6}, the probability is 1 — a that they simul- 
taneously satisfy 


(87) 6— Se 5056+ Sé, 


where the constant S is calculated from F. , the upper a point of the F-distribu- 
tion with J — land J — 1+ 14d4., by 


(88) S = (I = 1KJ — 13 — 1 + 1)"F i. 


Whenever the 7” test rejects H, at significance level a, there will exist contrasts 
6 for which the interval (87) does not cover zero (and conversely). However, it 
may occasionally happen in applications that none of the contrasts thus found 
to be “significantly different from zero” is of any practical interest. 

An interesting interpretation of the quantities ¢§ needed in (87), which yields 
an alternative way of calculating them not requiring calculation of the {a,,-} or 
{d,;}, is the following*: Let 6; be the estimate of @ from the jth column, 6; = 
> hyg., 306 = 6, ; then 


3 Pointed out to me by Professor J. W. Tukey. 





34 HENRY SCHEFFE 


(9) 6 = JV — 1)" D6 — 6)”. 


However, if the calculations for the T? test of H, have already been made, use 
of (87) may be faster. 


8. Concluding remarks. The 7? test of H, and the associated method of 
multiple comparison are valid under less restrictive assumptions about the errors 
{esi}. For instance, it would be sufficient that they be independently N(0, 03), 
or, more generally, that the J vectors with components ¢é;.,---+ , ez; be in- 
dependently distributed like a normal random vector with zero means and an 
arbitrary covariance matrix. The test and comparison method should be fairly 
insensitive to violation of the normality assumption (9), from consideration 
of the asymptotic distribution when J > ~. 

A common practice in the analysis of variance is to employ as statistic to test 
a hypothesis the quotient of two independent mean squares whose expected 
values are equal under the hypothesis, and to refer this statistic to the F-tables 
with the numbers of d.f. equal to the ranks of the quadratic forms in the mean 
squares. According to this practice, (MS),/(MS)4ss would be treated as 
though it had the F-distribution with J — 1 and (J — 1)(J — 1) df. under 
H, . A justification of this would be welcomed by the practitioner, because the 
computations are simpler and more familiar than those with Hotelling’s 7”, but 
until numerical investigations are made which indicate the errors involved are 
tolerable, the practice should be suspect in the present case. The exact distribu- 
tion of the statistic under H, depends on unknown parameters. The distribution 
has been treated by McCarthy [8], but in a form that does not seem useful for 
I > 3. Some general theory for the distribution of ratios of statistically inde- 
pendent quadratic forms in jointly normal variables has recently been given by 
Box [2], and the above distribution falls under an application he made to another 
problem ((3] pp. 489-490), where he approximates it by an F-distribution with 
reduced numbers of d.f. However, these numbers of d.f. would depend on the 
covariance matrix (7;;) whose elements are defined by (61), and if we were to 
estimate these numbers from the data—with somewhat questionable effects 
on the resulting test and multiple comparisons method—it would require com- 
putation of the whole estimated covariance matrix (#,;;) defined by (62). The 
amount of numerical work involved would then be comparable to that for the 
above exact methods utilizing the T” statistic. 

An interesting practical conclusion from the present model is that the number 
J of levels of the Model II factor should be at least a few more than the number 
I of levels of the Model I factor, since the F statistic for the T* test has J — I 
+ 1d-f. in the denominator. 

The writer acknowledges his inspiration from a paper by Tukey [13] in which 
the expected mean squares in the mixed model fall out as limiting cases of those 
obtained in sampling a finite model similar to the above with sampling of both 
rows and columns, as the population number of columns becomes infinite, and 





MIXED MODEL 35 


with the population number of rows equal to the sample number. The effect of 
sampling the rows is to make all permutations of the rows equally probable and 
thus impose the symmetry condition (S$). However, this does not affect the ex- 
pected mean squares we derived for A, B, AB, and e, since the formulas for the 
corresponding sums of squares are invariant under permutation of the rows. 
Wilk and Kempthorne [14] have recently calculated expected mean squares for a 
model somewhat resembling Tukey’s, closer to the above in that only columns 
are sampled, but differing more in that the error term is generated solely by the 
actual randomization used to assign the “treatment combinations” to experi- 
mental units from a finite population, with the consequent introduction of 
treatment-unit interactions: If these are neglected the expected mean squares of 
Wilk and Kempthorne agree with Tukey’s. 

A multivariate normal model‘ for randomized blocks was studied by Me- 
Carthy [8] as an approximation to Neyman’s [9] more realistic model reflecting 
the randomization actually used in the assignment of the varieties to the plots 
in each block. A test, implicitly assuming such a multivariate normal model for 
randomized blocks, and employing Hotelling’s 7T* was recently proposed by 
Graybill [5]. A multivariate model for the analysis of variance was also con- 
sidered by Box [1] in a different situation where the condition ($) was tenable, 
and he included among other tests one of (S). The application of Hotelling’s 
T’ statistic to test the equality of the components of the vector of means in 
samples from a multivariate normal population is due to Hsu [7]. 


REFERENCES 
[1] G. E. P. Box, ‘‘Problems in the analysis of growth and wear curves,’”’ Biometrics, Vol. 
6 (1950), pp. 362-389. 
[2] G. E. P. Box, “Some theorems on quadratic forms applied in the study of analysis of 
variance problems: I. Effect of inequality of variance in the one-way classifica- 
tion,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 290-302. 
[3] G. E. P. Box, ‘Some theorems on quadratic forms applied in the study of analysis of 
variance problems: II. Effect of inequality of variance and of correlation between 
errors in the two-way classification,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 484- 
498 
[4] C. Ersennart, ‘The assumptions underlying the analysis of variance,’’ Biometrics, 
Vol. 3 (1947), pp. 1-21 
'. GRAYBILL, ‘‘Variance heterogeneity in a randomized block design,’’ Biometrics, 
Vol. 10 (1954), pp. 516-520. 
(6) H. Hore.uma, “The generalization of Student’s ratio,’’ Ann. Math. Stat., Vol. 2 
(1931), pp. 360-378. 
[7] P. L. Hsu, ‘‘Notes on Hotelling’s generalized 7,’’ Ann. Math. Stat., Vol. 9 (1938), 
pp. 231-243 


uo 
_ 


4 When I discussed my results with Dr. Jerome Cornfield, he informed me that he and 
Dr. Max Halperin had also considered a multivariate model for the present problem and 
had been led to the 7? test. Professor J. L. Hodges, Jr., formulated the above multivariate 
model before I did. A model equivalent to the above under the assumptions (JU) and (8) 
was proposed earlier by Mr. Leon Herbach in an unpublished paper, in which he derived the 
expectations and distributions of the usual mean squares and tests of the usual hypotheses. 





‘ 
« 


36 HENRY SCHEFFE 


[8] M. D. McCarrny, “On the application of the z-test to randomized blocks,’’ Ann. 
Math. Stat., Vol. 10 (1939), pp. 337-359. 

(9] J. Nevman, ‘Statistical problems in agricultural experimentation,’”’ J. Roy. Stat. 
Soc.., Suppl., Vol. 2 (1935), pp. 107-180. 

{10} C. R. Rao, ‘Tests of significance in multivariate analysis,’’ Biometrika, Vol. 35 (1948), 


pp. 58-79 

(11) H. Scnerré, ‘‘A method for judging all contrasts in the analysis of variance,’’ Bio- 
metrika, Vol. 40 (1953), pp. 87-104. 

{12} H. Scuerré, “Statistical methods for evaluation of several sets of constants and 
several sources of variability,’’ Chem. Eng. Progress, Vol. 50 (1954), pp. 200-205. 

[13] J. W. Tukey, ‘Interaction in a row-by-column design,’ Statistical Research Group, 
Princeton University, Memorandum Report 18 (1949). 

(14) M. B. WiLk ANp O. Kempruorne, ‘Fixed, mixed, and random models in the analysis 
of variance,’”’ J. Amer. Stat. Assn., Vol. 50 (1955), pp. 1144-1167. 





OT aa ee age 


KEEPING MOMENT-LIKE SAMPLING COMPUTATIONS SIMPLE! 
Joun W. Tukey 
Princeton University 


1. Summary. This is an attempt to present as simply as possible the best tools 
we know today for keeping computations simple when dealing with samples from 
general populations. Such computations seem inevitably to be made in terms of 
quaritities related to moments. We develop here the formal structure and inter- 
relations of the two systems of multi-index quantities which seem today to be 
best adapted to statistical use. The occurrence of two systems is, at least in part, 
related to the appearance in statistical problems of both multiplication and 
addition of independent variables. Hence the existence of two systems, whose 
limiting cases are moments (about a fixed point) and cumulants (or semiin- 
variants). 

We present interconversion formulas, developing definitions and proving the 
pairing formulas without reference to any infinite populations, and sparing the 
use of combinatorial techniques as much as we are able. A few multiplication 
formulas are given, but for a more complete list the reader is referred to Wishart 
[10]. It is hoped that this paper can be read on its own, with some reference to 
applications of these techniques to elementary examples [6] and to the sampling 
properties of estimated variance components in the analysis of variance [7], 
{8], [9] as motivation 

The author’s best thanks go to N. R. Goodman for the checking of certain 
-alculations. 


2. Introduction. The history of “moments of moments’, still the only way we 
know to attack general sampling distributions, has been long and complicated. 
Its outstanding feature has been the cutting away of pages and pages of algebra 
by the introduction of new and sharper tools. The forging of these tools has 
depended more and more on combinatorial ideas, and while the use of the tools 
has become simpler, their understanding has become apparently more and more 
complex. It is the purpose of this paper to show how we can keep almost every- 
thing quite simple and still use what today seem to be the best tools. The only 
useful aspects which we cannot completely handle simply are the actual calcu- 
lation of certain multiplication formulas, an extensive table of which has been 
provided by Wishart [10]. 

The two systems which we shall discuss correspond to moments about a fixed 
point and to cumulants (semiinvariants). They do not correspond to moments 


Received April 6, 1955. 

1 Prepared in connection with research sponsored by the Office of Naval Research. 
Based on a part of Statistical Research Group, Princeton University, Memorandum Report 
45 ‘‘Finite Sampling Simplified,’ written while the author was a Fellow of the John Simon 
Guggenheim Foundation. 


37 





38 JOHN W. TUKEY 


about the mean, except insofar as the second and third moments about the mean, 
happen to be cumulants. From the point of view of practical use, either numerical 
or algebraic, I am convinced that higher moments about the mean are a vermiform 
appendix of statistical evolution—an evolutionary remnant which will in- 
evitably disappear, though all too slowly. 

The two systems involve not a single index, but a set of one or more indices. 
At first sight this may seem redundant and wasteful, since there are relations 
which could be used to eliminate the multiple index symbols. But these relations 
involve sizes of samples or populations, and a great part of the simplicity of the 
use of the multiple index systems arises from a great reduction in the appearance 
of these sizes. 

We shall work entirely in terms of finite samples or populations, treating in- 
finite populations as special limiting cases. Contrary to the usual view, this does 
not make matters more complex. 


3. The two systems. The first system of polynomial symmetric functions of n 
numbers 21 , 2, -** , 2, Which we shall use are the symmetric means (the mean 
power products, in combinatorial terminology) which we will denote by angle 
brackets, as. (3), (134), etc. and will refer to as symmetric means or brackets. 
They are defined as the means of products of powers of different x;’s, so that, for 
example, 


Ps 
2. riz; 


(ab) = 
ab a=) 


where the sum is over the n(n — 1) pairs (7, 7) with i + 7. The numerators are a 
kind of symmetric function of very respectable antiquity, the only modern 
features being (i) the division by the number of cases to give a mean, which seems 
natural to the statistician but perhaps not to the combinatorialist, and (ii) 
the use of multiple subscripts, which the combinatorialist has always done but 
which the statistician has seemed to resist. This resistance seems to have been 
due to a feeling that only the simple moments 


(a) = di 
7m 


could be easily calculated from numerical data, and that hence all formulas should 
be written in terms of moments. This position is tempting rather than irrefutable, 
and the simplicity of formulas involving the multiple subscripts shows its de- 
ficiencies. It is easy to continue the numerical calculation, once the moments are 
at hand, and find all the symmetric functions of either system. For all of weight 
< 4a simple computing form has been presented in [6]. 

The second, and even more important, system of polynomial symmetric func- 
tions is most simply defined in terms of the first system through linear formulas 


like these 





; 


MOMENT-LIKE SAMPLING COMPUTATIONS 39 


ke = (2) = (2) — (11), 
kus = (112) = (113) — 3(1112) + 2(11111). 


We shall obtain the general law of formation and supply a compact table of 
formulas up to weight 8. We shall usually refer to these as “polykays” for 
the sake of a short, simple term. (While the construction of this term, “kay” 
for “k” and “poly” for the multiple subscript, seems somewhat revolting to some 
colleagues, the use of “generalized k-statistics’’ seems too unhandy to me. In 
due course, perhaps, someone will find a good, short, terminology.) The polykays 
with but a single subscript are, of course, Fisher’s famous k-statistics, whose 
introduction was perhaps the largest step in clearing unnecessary algebra out 
of this field. The multiple index analogs were introduced by Dressel [3] in 1940 
in a combinatorial paper which seems to have escaped notice at the time of its 
appearance. They were introduced independently by the author in 1950 [6] 
as practical working tools. 

4. Random pairing, additive and multiplicative. Let us next consider two sets 
of n numbers, 2}, 22, ---, 2a and 2)", a2", ---, 2, whose brackets and 
polykays we shall similarly distinguish with asterisks, as, for example, (2)* 
and (2)**. We shall be concerned with the results of pairing these two sets 
randomly, more specifically with the results of forming some function of each 
of the pairs 


[xt , rc), (x? , xr02)], at [xn ’ xr(n)] 
where (1), r(2), --- , x(n) is a permutation of the integers 1, 2, --- , n, and 
where we shall eventually wish to average over all permutations. 
The simplest pairing operation is multiplicative pairing, where 2; = ziztlv . 
Let us calculate a moment of the resulting z;, say (a), and then average over 
all pairings [permutations]. We have 


(a) = SH DEN)” 
n 


n 


and when we average, the product of x} and z}* appears equally often for all 
pairs i and j, so that 


(x*)*. (x3*)* 
aver {(a)} = De (a)"- De (as = (a)*{a)** 
n? 


where we have written “aver” for the average over random pairing, as we shall 
continue to do, and where the denominator of n’ is easily justified as equal 
to the number of terms in the numerator when expanded (an average of a mean 
is again a mean). 


The same argument applies to (ab --- e), as we see if we write 
ae ab e 
Yij-.cm = Lilj--* Lm 
* *\a7 _ *\b *\e 
Yij-m = (xi) (x;) ee (1m) 


bd ** ** yd +. 
Yij-m = (2; )*(2; ) so * (Lm )° 








40 JOHN W. TUKEY 
and observe that 


cal a +e 
= Yij---mY x (i), 4(D 0+ (mM) + 


Thus we have 
aver {(ab --- e)} = (ab --- e)* (ab --- e)*™*. 


The brackets are ideally suited to multiplicative pairing. 

The only other random pairing which we know how to handle at all well 
statistically is the additive one, where 

* ** 
B= tre. 

Because of the many statistical problems where additive pairing is taken as the 
first approximation to reality (classical analysis of variance models, error propaga- 
tion, etc.), this is the most important case. 

The one-index (or as we would say in Section 11, one-part) brackets behave 


in a manageable, but not simple way under additive random pairing. We have, 
for example, 


{(3)) = aver {(3)} = aver Le (ai + 256m)’ 


nm 


X = + 3 aver De (ai )@r (x zero! +3 aver 22 (2 )(25(0) (21) (@rCo)" 


= aver 


LD (reo iste 


+ aver =——— 
(3)* + 3(2)*(1)** + 3(1)*(2)** + (3)** 


and, in general, 


(43) = aver {(j)} = G)* + ({)a-u (ys 


+ (2) (j — 2)°@)** + --- + (yj) 


where we have introduced a doubling of the brackets to indicate averaging over 
an additive random pairing. 

Since the general formula is of binomial type, we can represent it in terms 
of generating functions. If we define 


3 


2 
Moverlt) = 1 + (A) + ((2)) 55 + (B)) gH oo 


3 


M*(t) = 1+ (1)"%+ 2)* = + 3)" + ad 


2 3 


M**(t) = 1 + (1)**¢ + (2)** = + (3)** * 4 «-- 





MOMENT-LIKE SAMPLING COMPUTATIONS 


then the general formula becomes 
M aver(t) = M*(t)M**(t). 


We shall use this relation in Section 11 to obtain general expressions defining 
the second system of quantities in relation to the brackets. 


5. The polykays. This second system will be denoted either by ka.... or by 
(ab --- e) as may be convenient. We shall use the same double parenthesis 
convention, so that 


((ab --- e)) = aver {(ab --- e)} 


where the averaging is over additive random pairing. 

Two examples of the second system, beyond the trivial (1) = (1), are (2) = 
(2) — (11) and (12) = (12) — (111). Let us examine the behavior of these 
quantities under random pairing, using unproven, but formally reasonable, 
facts about the behavior of multipart brackets. We find 


((2) = ((2)) — ((11)) = (2)* + 2(1)*(1)** + (2)** — (11)* — 2(1)*(1)** — (11)** 
= [(2)* — (11)*] + [(2)** — (11)"*] = (2)* + (2)** 
((12)) = ((12)) — ((111)) 
= (12)* + (2)*(1)** + 2(11)*(1)** + (1)*(2)** 4+ 2.1)*(11)** + (12)** 
— (111)* — 3(11)*(1)** — 3(1)*(11)** — (111)** 
= [(12)* — (111)*] + [(2)* — (11)*)(1)** + (1)*[(2)** — (11)**] 
+ [(12)** — (111)**] 
= (12)* + (2)*(1)** + (1)*(2)** + (12)** 
It should now be clear both what the pairing law is likely to be, and that we 
need some slick trick both to discover the definitions and to prove the result. 
Since the trick the author prefers involves symbolic multiplication, we shall 
postpone its treatment to the last section. We announce here the pairing formula 


we desire and leave definitions and proofs to Sections 11 and 12. The pairing 
formula is illustrated by 


ave {ka-} = ((abc)) (abc)* + (ab)*(c)** + (ac)*(b)** + (bc)*(a)** 
+ (a)*(bc)** + (b)*(ac)** + (c)*(ab)** + (abc)** 


In general, we separate the indices into two sets (one of which may be empty) in 
all possible ways, assigning one set to * and the other to **, and adding up the 
resulting products. We know that we have the desired definitions when the 
pairing formula is an identity with ((ab - - - e)) the same function of the ((fg - - - 7)) 
as (ab --- e)* is of the (fg --- 7)* and as (ab --- e)** is of the (fg --- 7)**. 

One special case is worthy of notice. If all the z?* are identically equal to 6, 
then, (ep. [7}) 





JOHN W. TUKEY 


(ay** = 8; (aby** = a; (abey** = FHF, oe 5 (1) = 8; 


(a)** =0, for a>1; 0 = (ab)** = (abc)** = (abcd)** = «++, 


So that the effect of pairing with this population, which is independent of the 
randomization and exactly equivalent to increasing all the z? by 4, is given 
by such relations as 


(11)* + 28(1)* + &, 
= (12)* + 6(2)*, 
= (111)* + 38(111)* + 38°(1)* + & 
(112) = (112)* + 26(12)* + 8°(2)*, 
(22) = (22)*. 


Thus the effects of increasing all the z? by 6 are easily found. We notice for 
future use that the highest power of 6 is the number of 1’s in the polykay; which 
appears with a coefficient found by dropping the 1’s from within the parentheses. 


. . . , 
6. Inheritance and representation. If x, , z.,--- , z, is a sample from 2, 
, , . 99 ° 

2, ++, 2y, and if “ave” means a simple average over all samples, then sym- 

metry implies that 


ave {(ab --- e)} = he aver {2/25 +++ tm} — Dy (wi)*(zj)” +++ (an)* 


x 
number of terms above nur 


nber of terms above 


_ (ab T ey’, 


Since the polykays are expressible linearly in the brackets with integer coefficients 
the same relation 


, 
ave {ky...0} = ke... 


must hold for polykays. We refer to this as inheritance on the average. Some 
would prefer to say unbiasedness (but there are now so many kinds of un- 
biasedness! ). 

Since any polynomial symmetric function can be expressed linearly in terms 
of the brackets, and since, as we shall see, brackets and polykays can be expressed 
linearly in terms of one another, every polynomial symmetric function can also 
be expressed linearly in terms of polykays. 


7. Unit parts. Each index “‘1”’ which appears in a bracket or a polykay will be 
called a unit part. These indices play an especially simple role. If we have a 
linear identity connecting polykays and brackets, we can obtain a new one 





MOMENT-LIKE SAMPLING COMPUTATIONS 43 


by adding unit parts to all the terms. Thus from kz = (3) = (3) — 3(12) + 
2(111) we derive kyy = (13) = (13) — 3(112) + 2(1111), kus = (113) = 
(113) — 3(1112) + 2(11111) and so on, while from (2) = (2) + (11) we derive 
(12) = (12) + (111) = ky + Ayn, (112) = (112) + (1111) = kie + Aim and 
so on. This fact will be proven in Section 12. 

Thus it would be convenient to write these formulas in a single shorthand 
form, such as (—3) = (—3)—3(—2) + 2(—), (—2) = (—2) + (—) where 
the dashes stand for the number of 1’s. This number is usually different in 
the different appearances in one formula, but these numbers are to be chosen 
so as to make all terms of the same degree. 

Fur further abbreviation, we may drop the dashes themselves (except in (—) 
and (—) where they seem helpful). If we do this, then we can give a single table 
which presents the coefficients for all the identities connecting brackets and 
polykays through degree 8. This is done in a compact form (pioneered by David 
and Kendall [2]) in Table 1, where brackets are expressed in terms of polykays 
by the coefficients below and on the main diagonal, while polykays are expressed 
in terms of brackets by the coefficients above and on the main diagonal. (A 
less convenient table, extending through weight 12, has recently been provided 
by Abdel-Aty [1].) 

More specifically, to express (1134) in terms of polykays, we proceed as fol- 
lows: 

(a) look for (34), which identifies a row, and start with the heavy 1 on the 

diagonal of that row, 


(b) move along that row from the diagonal 1 toward the beginning, writing 
down the coefficients times the corresponding polykays, (This yields: 


(34) = (34) + 3(223) + 4(33) + 3(24) + 9(222) + 18(23) 


+ (4) + 21(22) + 5(3) + 9(2) + (-) 
in shorthand notation.) 
(c) note that 1 + 1 + 3 + 4 = 9 and add 1’s to every term to bring the 
degree of every term up to 9. (This yields: 
(1134) = (1134) + 3(11223) + 4(11133) + 3(11124) + 9(111222) 
+ 18(111123) + (111114) + 21(1111122 
+ 5(1111113) + 9(11111112) + (111111111) 


which is the desired result.) 


To expand a polykay in brackets we merely interchange rows and columns, 
moving upward from the diagonal. 


8. Computation modulo unit parts. We see easily, either from the general re- 
lations of Sections 11 and 12, or from the nature of the reduced formulas, that 
when a bracket with g unit parts is written in terms of polykays, only polykays 
with at least g unit parts appear and vice versa. It is then unequivocal if we write 





(souo jo [euoZerIp uo yoard osn OF) 


= 
tn 
8 
to 


~~ 


(1) 





(8) 


— 


8) | OF O9¢ 


(FF) | 98 8b 
(S€) Sp 08 
(92) | | 09 08 09 | 
(2) | GOL O1Z SE} SOT $e {IZ | 








(PP) 
(gg) 
(92) 
(2) 


($82) 
(ZS) 
(FE) 
(GZ) 
(9) 


\ loin Rom! 


(E82 OI— of- 16 8 | St @ |L | 
($22) cT— /3r 8 I |} 4+ 8 
(F&) g— cg— ’. 8@-tiv @ & 
(ez) e— 9— 1st 02 ¢ | 9% OTIII 
(9) oo L- | 9t 09 SI | SF ist 


(2222) 
($2) x 
cc 
(6G) 

(g) 


(2222) 0g P iP 
(22) | OF 09 OZ «OTS fs €- g . g 

(€€) 02 Or OFT a OT | 
(FZ) | 02 cI cr O12 | = . T (2 
(¢) z 9 oy OC = T 








(ZZ) 
($2) 
(¥) 


(GZ) ; 06— OOf&— Of9— | 6 6 og «0k i 
($2) | 96 OFI— OF— O09ZI-— % Of Oz 7- O- 
(#) OI- O8-— O1Z-— | z g 0g I- @-| 





cI 0€ 


(82) 


(22) OF 089 0292 ce— 06— 012- a 
\ 


(€) 9 021 Ors ao a 


(2) 
(=) 


4) Z6I— O8F— Ozsz— ce COSC 














SP ; ZI—- F- O@I- 


| 


& 
IDINULLO Y U0I8.19QU0) 
I WIaVvL 





MOMENT-LIKE SAMPLING COMPUTATIONS 45 


O(1’) for an arbitrary set of terms each of which, when expanded linearly in 
brackets or polykays contains at least g unit parts. 
If we have an identity which is a polynomial in the polykays, as for example 


: 1 1 
ki — keky + = ks + kn = oa hs + 5 hs + kin 


each term has a certain total number of unit parts. In the example, these numbers 
are 3, 1, 0 and 1 on the left-hand side and 0, 1 and 3 on the right-hand side. 
The highest total appearing on either side is the unit weight of that side. In the 
example, the unit weight of each side is 3. If we shift all the z; by pairing them 
with a set of values all equal to 6, each term will be replaced by a number of 
terms involving various powers of 6 up to and including a power equal to the 
number of unit parts. The total coefficients of each power of 6 must also give an 
identity. Thus any identity gives rise to a number of associated identities. 

If one side of the initial identity was linear in the polykays, and all obvious 
cancellations had been made initially (as is the case on the right-hand side of 
the example), then the coefficient of the highest power of 6 appearing on that 
side after pairing would be linear in the polykays, would have no obvious cancel- 
lations, and hence would not vanish identically. The coefficient of the same 
power of 6 on the other side, which is identically equal to this, cannot vanish, 
and hence 


(Unit weight on other side) = (Unit weight on linear side) 


Since any polynomial in the polykays is a symmetric polynomial in the z’s, it 
can be written linearly in the polykays. The unit weight will not be increased 
in this process. In particular, a polynomial in polykays without unit part (and 
hence of unit weight zero) when written linearly in the polykays involves no 
polykay with unit part. If we know the linear representation to terms O(1), we 
know it exactly. Similarly, the unit weight of the left-hand side in the example 
is 3. If we know the linear representation (the right-hand side) to O(1‘), we 
know it exactly. 

In table 1 we have both heavy and light coefficients. Only the heavy ones 
need to be used if we adjoin +O(1). Thus, for example, 


(34) = (34) + 3(223) + O(1) 
(34) = (34) — 3(223) + O(1) 
(223) = (223) + O(1) 


As this example shows, the formulas are often much simplified by this process. 


9. Multiplication of brackets. Both brackets and polykays are chosen so as to 
remove the inevitable combinatorial difficulties from as many formulas as pos- 
sible. As a result, combinatorial considerations have been restricted to the 
formulas for multiplication. For brackets, the resulting formulas are relatively 
simple. Thus 





JOHN W. TUKEY 


= >D#LDa 
nt 
1 < ab a+b 
1[Lai+ Da | 


5 [n(n — 1) (ab) + nla + b)) 


n— 
n 


* (ab) + = (a + b) 
in @ similar way we find 


(n — 3)(n — 4) o~s 
“Tray (abode) ++ Ay (a + , bee) 


n—3 n—3 
+c Oe tae Oe 5 ee tome 


(abc) (de) = 


n—3 n 


—3 
+ may © + % acd) + Ty (e + 4, bee) 


(n 


n—3 1 
+ Gun Mtoe + a7 Ot 4h tee 


n(n 


1 
+p Ot 69449 +5 


ae 
n(n (n — 1) 


(b + d,c + e, a) 


dati ; 1 
laa Ot OF +40 + aye thet e8) 


( 

+ i oseiiaman (c+ d,b + e, a) 

n(n — 1) ° , ae 
In general, we obtain all brackets which can be obtained by matching some 
(including none) of the letters in one bracket with letters in the other and then 
replacing matched letters by their sum. The coefficient is a simple function 
of the number of parts in the factors and the result, with a simple denominator. 
It is often convenient to expand coefficients as integer coefficient combinations of 


eT dil etaiiaela Mea 
n(n — 1)’ ~ n(n — 1)(n — 2)’ 


1 
~ n(n — 1)(n — 2)(n — 38)’ 


q= 


These expansions are given for certain products in Table 2. With the aid of this 
table, multiplication of brackets is merely a matter of exerting moderate patience 
to be sure that you have all the terms. 


10. Multiplication of polykays. The multiplication formulas for polykays are 
more complex. They can be obtained symbolically (ep. Wishart [10], Kendall 





Nota 


MOMENT-LIKE SAMPLING COMPUTATIONS 47 


TABLE 2 
Expansions of factors for bracket multiplication 


Parts in bracket whose coefficient is sought 
Parts of 


factors : 7 — 























| 
1x1 P l1-—p — — — | — 
1X2 — | p [t—Bp- — _ | _ 
1X 3 - p | 1 — 3p —_ — 
ixé> mf] — } =. | p =" | 
2X2 - | q pPp-@q 1 — 4p + 2q | — _ 
2X3 — | q p — 2q | 1 — 6p + 6q —_ 
2X 4 - -— — q p — 3¢ 1 — 8p + 12¢ 
3X 3 - r q-T | p—4q+2r |1 — 9p + 18¢ — 6r 
3X 4 — — r q — 2r p — 6q + 6r 
4X4 


— — — 8 | r-s q — 4r + 28 
t ! 


| 





[5}) or by direct calculation. One way to carry out direct calculation is to express 
each polykay in brackets, multiply out the brackets and then reconvert the 
resulting brackets to polykays. For example 


kisk, = (12)(2) = [(12) — (111)}[(2) — (11)] 
= (12)(2) — (111)(2) — (12){11) 4+ (111)(11) 
= [1 — 2p}(122) + p(23) + p{14) — [1 — 3p]{1112) 
— 3p(113) — [1 — 4p + 2q](1112) — 2[p — g}{122) 
— 2[p — g)(113) — 29(23) + [1 — 6p + 69}{11111) 
+ 6[p — 2g}(1112) + 69(122) 
= [p — 29](23) + p(14) + [1 — 4p + 89](122) 
+ [—5p + 2q](113) + [—2 + 13p — 149](1112) 
+ [1 — 6p + 69}(11111) 
[p — 2g}{(23) + 3(122) + (1138) + 4(1112) + (11111)] 
+ p[(14) + 3(122) + 4(113) + 6(1112) + (111111)] 
+ [1 — 4p + 8q)[(122) + 2(1112) + (11111)] 
+ [—5p + 2g)[(113) + 3(1112) + (11111)] 
+ [—2 + 13p — 14g)[((1112) + (11111)] + [1 — 6p + 69](11111) 
[p — 2g](23) + p(14) + [1 + 2p + 2g](122) + O(113) — O(1112) 


ll 


+ 0(11111) 
= FE te ty = le hy hee ta 
n(n — 1) n n—1 


Te 





48 JOHN W. TUKEY 


Even for this case, some care in computation is advisable. Clearly direct com- 
putation should be avoided to the greatest extent possible. 

In some cases, it is possible to obtain a substantial saving in computation 
by calculating modulo unit parts in a suitable sense. Thus in the example just 


given we may neglect terms O(1”). Thus we could have written the products of 
the brackets as 


[1 — 2p](122) + p(23) + p(14) — 2(p — g)(122) — 2¢(23) + 6q(122) + O(1’) 
= [1 — 4p + 8q][(122) + O(1*)] + [p — 2g][(23) + 3(122) + O(1’)) 
+ p[(14) + 3(122) + 0(1°)} + 0(1’), 


which avoids a certain amount of algebra. 
As a more complex example, let us take 


Keoken = (22)(22) = ((22) — 2(112) + (1111))((22 — 2(112) + (1111)) 
= (22)(22) — 4(22){112) + 4(112)({112) — 4(112){1111) 
+ 2(22){1111) + (1111){1111) 
where 
(22)(22) = [1 — 4p + 2q](2222) + 4[p — q]{224) + 29(44), 
(22){112) = 2q(233) + O(1), 
(112){112) = 2[q — r](2222) + 2r(224) + 4r(233) + O(1), 
(22){1111) = O(1), 
{112){1111) = O(1), 
(1111)(1111) = 248(2222) + O(1), 
so that 


keokes = [1 — 4p + 10g — Sr + 248](2222) + [p — 4q + 8r](224) 


+ [—8q + 16r](233) + 29(44) + O(1). 


(44) = (44) + 6(224) + 9(2222) + O(1), 
(224) = (224) + 3(2222) + O(1), 

(238) = (233) + O(1), 
(2222) = (2222) + O(1), 








MOMENT-LIKE SAMPLING COMPUTATIONS 49 


keakes = [1 + 8p + 169 + 16r + 248](2222) + [4p + 8¢ + 8r](224) 


+ [—8q + 16r](233) + 29(44) 
a 8 24 | : ae 
v b Ta—2t aw —Dm—Dm—s) fom + em 
8(n — 4) 


n(n — 1)(n — 2) kess + ku. 


n(n — 1) 
This result agrees with ke2k2 as obtained from Wishart’s formulas in the form 


= 1 2 ey 
\n+1 a n(n + 1) = 


kee kee 


a= IY 2 fa — 1) os (n — 1)° . 
“(5 i) Ba an + 1p? t ae 
and thus provides an additional check on Wishart’s result. 

Many of Wishart’s formulas were independently obtained by the writer before 
their publication by Wishart. In most cases agreement was good, and in the 
others the writer’s algebra proved at fault. 

A few of the simplest are now given for easy reference: 





2 1 2 
ke => kee + - kg at —_———— kee , 
n n—1 
ky e. = : 


kia + “= Kast; 
n 


l 1 
ky ke => kant ~ - ka+1.b + a Kes1,05 
n n 


ka ky 6 
L — 


hee + = te one hp. 

n 1 1 
For others the reader is referred to Wishart [10]. If products of weights greater 
than 8 are ever needed, it is very probably that terms in 1/n, or perhaps through 
1/n’ will suffice. In such cases, an extension of Table 2, neglecting r, s, t, --- 
(and gq if terms in 1/n will suffice) forms the basis of a simple method of calcula- 
tion. 


11. The o-multiplication and one-part k’s. We know (cp. Section 4) that the 
generating functions {((a)) = aver (a)}, {(a)*} and {(a)**} satisfy the relation 


Maver(t) = M*(t)M**(t) 


where the ((a)) are the averages over all pairings of the (a) which are defined 
for all pairings of the sets defining the (a)* and (a)**. To obtain the one-part 
k’s without reference to the theory of infinite populations, and to prepare the 
ground work for the introduction of the multipart k’s, we introduce a symbolic 





50 JOHN W. TUKEY 


multiplication among the following quantities: real numbers, all ((ab --- e)), 
all (ab --- e), all (ab --- e)*, all (ab --~- e)**, the integer powers of an indetermi- 
nate ¢, and all linear combinations of the above. This multiplication is written 
‘“‘o” and is defined to satisfy: 
(1) except when a bracket is multiplied by a bracket of the same family, 
o-multiplication of the elementary quantities is ordinary multiplication, 
(2) o-multiplication is distributive with respect to addition, subtraction and 
multiplication by real numbers, 
(3) o-multiplication of brackets from the same family is accomplished by 
combining indices, as in (23)0 (14) = (2314) = (1234). As examples 
of rule 3 we have 


(11)* 0 (34)* = (1134)*, 
(2)** o ((2)** — (11)**) = (22)** — (112)**, 
where rule 2 was used in the latter case, while rule 1 shows that 
((2)) 0 (1)* = ((2)){1)*, 
(3) 0 (24)** = (3)(24)**. 
In terms of this symbolic multiplication we have a commutative ring with formal 
power series in ¢. We can form formal o-exponentials and o-logarithms of ap- 


propriate expressions, and these functions will have the usual formal properties. 
Thus 


2 3 
o-exp (tX) = 14X45 (XoX) +; (Xo XoX) +4 (KoXoXoXMt---, 


2 3 4 
o-log (1 + tX) = tX -5 (XoX) + : (Xo X0X) 5 (XoXoXoX)+-:--, 


and, in particular 
o-log[(1 + ¢X,) o (1 + #X2)] = o-log (1 + ¢X,) + o-log (1 + #X,). 


Now M*(t) involves brackets with one asterisk, and M**(¢) involves brackets 
with two. Hence 


M*(t) o M**(t) = M*(t)M**(t) = Maver(t) 
and, taking o-logarithms on both sides 
o-log M*(t) + o-log M**(t) = o-log Maver(t). 


If we write 
2 3 
VO) = (I+ D+ Bat: = clog MW 


and define Waver(t), ¥*(¢) and y**(¢) similarly, we have 
Waver(t) = y*(t) a y**(t) 





MOMENT-LIKE SAMPLING COMPUTATIONS 


and, comparing coefficients 

(ij) = G)* + G)* 
where ((j)) is the same function of the ((a)) as (j)* is of the (a)* and (j)** is 
of the (a)**. Thus we have defined (7) = k; so as to have the right property. 


We have only to calculate the relations explicitly. 
To do this, we have only to write out 


v(t) = o-log M(t) 
remembering to use o-multiplication on the right. We find 
fe 


t* ; t t* 
thy + 5 kat x ha + ves = Kl) + 5 2) +3 8) + vee 


9 


1 ' ae t 
—3 («ay +5 @) + -)o(tay +5) + vee 


+ 3(t(1) + +++) 0 (1) + +++) 0 (1) + ++) 


t t° 
= Kl) + 5 (2) + 3 8) + tee 


—43(F(11) + @(12) + +++) 
+ ¥(P(111) + +++) +e 
2 3 
= t(1) + 5 (2) = Gt) + 5 (@) — 842) + 201) + + +, 


so that k; = (1), ke = (2) -- (11), ks = (8) — 3(12) + 2(111), --- . In case the 
population is infinite, the symmetric means become moment products, the k’s 
become cumulants and the o-multiplication becomes ordinary multiplication. 
These formulas become the well-known relations connecting cumulants and 
moments. 


ki = mi, 
ke = ws — wr, 
ks = ws — Suomi + Qu, 
and the coefficients up to order 12 are given in Kendall ((4], section 3.13). 


12. Commutativity. We now wish to show why o-multiplication is commuta- 
tive with additive pairing. We recall (from Section 4) that 


aver {(j)} = (9) = ¥ (7) — k)*(k)** = > (?) (j — k)* 0 (k)** 


and proceed to find the corresponding formula for a two-part bracket. We have 





52 JOHN W. TUKEY 


3 1 = 
(97) = wa- > (x? + Dees») (x5 + teu)’ 


n(n — 1) 


= ey LES) (DE eden ett Metto" 


and when we average over a random pairing, we may split the z* from the z**, 
just as for one-part brackets, obtaining 


(@gi)) = aver (493) = E(2)(7) o — hj — Ryan 


@i)= @9)0G), @-hi-k* = @g—h)*oG — k)*, 
(hk)** = (h)** 0 (k)**, 


and this becomes 


((g) 0 (j ») yy ae 3 (7) g — h)*o (h)** 0 (j mes k)* o (k)** 
b (7) woe a nye | 0 bP (2) (j — k)*o ae | 


((g)) 0 ({j)). 


Thus averaging over random pairing commutes with o-multiplication for two- 
part brackets. 

An entirely analogous proof holds for brackets with more than two parts, 
and, since the o-multiplication was defined for brackets and extended by linearity, 
we have commutativity in general. In particular, we have commutativity for 
polykays, so that 


((1) o (2)) = aver {(1) 0 (2)} = ((1)) 0 ((2)) 
((g) o (7)) = aver {(g) 0 (7)} = ((g)) o ((9)) 
a result we will use almost at once. 


13. The multipart k’s. We shall now define the multipart k’s by symbolic 
multiplication, putting 


(12) = kis = (1) 0 (2) = ki oke, 
(abe --- e) = (a) 0 (b) 0 (c) 0 «++ of), 


this means, of course, that we may find the expressions for the multipart k’s 
by writing out the corresponding single-part k’s in terms of brackets and sym- 
bolically multiplying out. Thus 


(22) = (2) 0 (2) = [(2) — (11)] 0 [(2) — (11)] = (22) — 2(112) + (1111). 





MOMENT-LIKE SAMPLING COMPUTATIONS 


We notice that, for the case of additive random pairing 
((12)) = ((1)) 0 ((2)) = [(1)* + (1)**] 0 [(2)* + (2)**] 
= (1)* 0 (2)* + (1)* 0 (2)** + (1)** 0 (2)* + (1)** 0 (2) 
= (12)* + (1)*(2)** + (1)**(2)* + (12)** 
and that the formula for ((ab)) is entirely analogous to this. Indeed, more 


complex expressions of similar form hold for the more-than-two-part k’s and 
we immediately see that all the multipart k’s satisfy the previously announced 
pairing formulas. 

To complete our transformation formulas, we need to express brackets in 
terms of polykays. To this end, we write out 


M(t) = o-exp (y(t), 
we find 


2 


t t° 


2 
+e) + +£a+t B) + L+th+ okt okt 


2 
+3 (m+ ke + - +) o(ti + 5 ha + =) 
+ 3(tki + +++) 0 (thi + +++) 0 (th, + +++) 
+ eee 


=1+th+5 “ha + vy + - 
+ ki ok, + Ckiok, + ++: 
+ lie 2 i 

=l+ih+5 vlan + 5 = ha + 


+ 4tkn + Che + +++) 


1 
rs ) + ay 


= 1+ th +3 E (hs + ku) +z -( + 3ki2 + kin) + - 
and comparing coefficients, 
(d)=h, (2) = ko + kn, (3) = ks + 3k + kin +: 


For an infinite population, these reduce the familiar formulas expressing 
moments in terms of cumulants, namely 


m = Ki, we = Ket+ Ku, us = K,;+ 3Kun + Kin, -:: 





54 JOHN W. TUKEY 


and again these can be found up to order 12 in Kendall ((4], section 3.13). This 
time, however, the nature of the exponential function makes it easy to write 
down the coefficient of 


Kaaswaserpeg in (aa + Bb + +++ Sd), 
It is 


(aa + Bb + --- + 6d)! 1 
(a!)*(B!)® +--+ (6!1)4 = alb! --- d! 


For example, the coefficient of ky in (3) is 
3! 
1!2!1!1! 
Thus individual coefficients are easily checked. 


REFERENCES 

{1] S. H. Asppeu-Ary, “Tables of generalized k-statistics.’? Biometrika Vol. 41 (1954 
pp. 253-260. 

[2] F. N. Davip anp M. G. Kenpatu, ‘‘Tables of Symmetric functions—Part I,’’ Bio- 
metrika Vol. 36 (1949) pp. 431-439. 

[3] Paut L. Dresset, ‘Statistical seminvarients and their estimates with particular 
emphasis on their relation to algebraic invariants,’’ Annals of Mathematical 
Statistics. Vol. 11 (1940) pp. 33-57. 

[4] Maurice G. Kenna, The Advanced Theory of Statistics. Volume 1 (1943) London. 
Chas. Griffin and Co. 

[5] M. G. Kenpa.t, ‘‘Moment-statistics in samples from a finite population’’ Biometrika 
Vol. 39 (1952) pp. 14-16. 

[6] Joun W. Tuxey, “Some sampling simplified,’’ Journal of the American Statistical 
Association. Vol. 45 (1950) 501-519. 

[7] Joun W. Tukey, ‘‘Variances of variance components: I. Balanced designs.’’ To appear 
in the Annals of Mathematical Statistics. 

[8] Jonn W. Tukey, “Variances of variance components: II. Unbalanced single classifica- 
tions.’’ To appear in the Annals of Mathematical Statistics. 

[9] Jonn W. Tuxey, ‘‘Variance components: III. The third moment in a balanced single 
classification.’’ To appear in the Annals of Mathematical Statistics. 

({10] Joun WisHart, ‘‘Moment coefficients of the k-statistics in samples from a finite popula- 
tion.’? Biometrika Vol. 39, (1952) pp. 1-13. 

{11] Jonn Wisuart, ‘“The combinatorial development of the cumulants of the k-statistics,’”’ 
Trabajos de Estadistica Vol. 3 (1952) pp. 13-26. 





SYMMETRIC FUNCTIONS OF A TWO-WAY ARRAY! 


By Rosert HooKks? 


Princeton University 


1. Summary. A family of polynomials in the elements of a two-way array, or 
matrix, is introduced. This family is an extension, from sets to matrices, of the 
family of symmetric polynomials hk, , kz, ku, ks, ki, etc., defined by Tukey 
[6], christened “polykays”’ in [7], and which are a generalization of the family 
k, , ke, etc., defined by R. A. Fisher [1]. The polynomials of the present paper, 
called “bipolykays,” are symmetric functions in the sense that they are in- 
variant under permutation of rows and/or columns of the matrix. This paper 
defines the bipolykays, shows that they are inherited on the average, develops 
the formulas for use in random pairing, and provides tables for conversion and 
for multiplication. A description of applications (see [2], [3], and [4]) will be 
postponed until a later paper. These applications include (a) finding expressions 
for sampling moments of functions of the elements of a matrix which is a “bi- 
sample” from a larger matrix, (b) finding expressions for sampling moments of 
functions (such as estimates of variance components) associated with the analysis 
of variance of a two-way table with systematic interactions, and (c) finding un- 
biased estimators for the variances and covariances of estimated variance com- 
ponents in a two-way table without interactions. 


2. Introduction. Let z;(J = 1, 2,--- , N) be any population of N numbers, 
and let z,(¢ = 1, 2, --- , n) represent elements of a sample of size n from this 
population. Let f(n; 2, --+ , 2,) be a polynomial which is symmetric in the z; 
and has coefficients which are functions of n. Such a function extends obviously 
to a polynomial f(N; 2, --- , ty), the corresponding symmetric polynomial in 
the x, , with the coefficients changed only by replacing n by N. Writing “ave” 


; : oe ns ; 
for the operation of averaging over all ( ‘ distinct samples of size n from the 
population, we say that f(n; 2, , --- , z,) is “inherited on the average”’ [6] if 
(1) ave f(n;2%1,°*:,%n) = f(N;%1, +++ , tw). 


The functions k, , ke, ky, etc., defined in [6] and now called polykays, are 
symmetric polynomials that are inherited on the average. Any symmetric poly- 
nomial can be expressed as a linear combination of polykays, so that the average 


' Mite ied . 
value (or expected value, if the 9) distinct samples are assigned equal prob- 


abilities) of the polynomial can be found simply by replacing each polykay in 


Received July 28, 1954. 
1 Prepared in connection with research sponsored by the Office of Naval Research. 
2? Now with the Westinghouse Research Laboratories, East Pittsburgh, Pa. 

55 





56 ROBERT HOOKE 


this linear combination by the corresponding population polykay, i.e., by apply- 
ing (1) to each term. 
In order to use polykays in connection with a linear model, say 


Yig = Ms + G43, 


one needs to find the polykays of the y’s in terms of those of the m’s and e’s. 
The rules for doing this are called ‘“‘pairing formulas” (Section 3), and an im- 
portant advantage of polykays over most other symmetric polynomial functions 
inherited on the average is the simplicity of their pairing formulas. 

In this paper we shall consider a matrix population of numbers 
ryi(I = 1,2,---,R; J = 1, 2, --- , C) from which a bisample (sample matrix) 
is selected by taking a sample of r of the R rows and another sample of c of the 
C columns and forming the matrix whose elements are at the intersections of 
these selected rows and columns. Symmetric polynomial functions of such 
matrices (i.e., polynomial functions of the elements of a matrix which are in- 
variant under permutation of rows and/or columns) will be considered. It will 
be shown that any such function can be expressed as a linear combination of 
bipolykays, which will be defined as a special family of functions that are in- 
herited on the average and have simple pairing formulas. These properties make 
the bipolykays useful in the determination of moments of moments, for example, 
associated with a two-way classification. 

The author wishes to express here his indebtedness to Prof. J. W. Tukey for 
several helpful suggestions, and to Dr. Frederic Lord for posing the original 
matrix sampling problem (see [2]) which started this investigation. 


3. Polykays. Polykays are defined by examples in [6], and a general definition 
may be found in [7], [8], or [9]. Since a different, though equivalent, definition 
appears to be more suited to the extension to bipolykays, this section will be 
devoted to a general definition of polykays and to the derivation of those proper- 
ties which will be required in this paper. We begin with some notation and 
terminology of [6] which will be used throughout. 

The symbol > will mean a sum over all subscripts that follow, but such that 
subscripts represented by different letters must remain unequal throughout the 
summation. For example, 


2 
> 242; = X22 + 22%. 
t,j=l 


A symmetric mean is a polynomial 
| 
~ ab d 
ye XiXy°** Im, 
where the subscripts are summed from 1 to n (for samples) or from 1 to N (for 


populations), the exponents are positive integers, and M is the number of terms 
in the summation. When the sample (or population) size is given, the symmetric 





SYMMETRIC FUNCTIONS 


mean is specified by the exponents, and so is abbreviated by writing the ex- 
ponents within brackets, as in 


(abd) = eect _ > xix at. 
n(n — 1)(n — 2) 
When a symmetric mean is defined over a population, this fact is indicated by 
& prime, as in 
| ’ 1 ~ a bd 
(abdy = NW - DW — 2) za X, XX. 

It is obvious that any symmetric polynomial function can be expressed as a 
linear combination of symmetric means. Since symmetric means are inherited 
on the average [6], they are sufficient for the problem of finding expressions for 
sampling moments of moments of a single sample. However, in dealing wich an 
additive model, one works with numbers which are sums of numbers samp!ed 
from different populations. To provide for this case, Tukey uses the notion of 
“random pairing”’: this means taking two samples, (2; , --- ,%,) and (yi, +++ ,Yn); 
the order within each having been independently randomized, and adding the 
two to obtain a new sample (z;, --- , Zn), Where z; = 2; + y;. For symmetric 
functions of the z’s one wants the average value (where the average is taken 
with respect both to sampling and to randomization of order within samples) 
expressed, by means of a “pairing formula”’, in terms of symmetric functions of 
the two original populations. Using ‘‘ave’’ as before, together with “aver’’, 


meaning ‘“‘average over randomization”’, and using one and two primes, respec- 
tively, for the populations of z’s and y’s, we have the following example of a 
pairing formula as applied to the symmetric mean (12) taken over the z’s: 


ave aver (12) = (12) + (1) (2)” + (2Y (1)” 
+ 21) (41y” + 2(11Y 1” + (12). 
The polykays are linear combinations of symmetric means chosen, among 


other reasons, because of their having simple pairing formulas. Those of degree 
3 or less are defined as follows: 


ki = (1), kin = (111), 
(2) ky = (11), kw = (12) — (111), 
ke = (2) — (11), kz = (8) — 3(12) + 2(111). 
The pairing formula for ky, becomes, for example, 
ave aver ky = kie + kiks + koki + ke. 


The remainder of this section consists of a general definition of the polykays 
and a derivation of the pairing formulas for symmetric means and polykays. A 
new notation for symmetric means (to be extended later to polykays) will first 





58 ROBERT HOOKE 


be introduced. Henceforth the notation used in (2), above, will be referred to as 

the primary notation, and that used in (3), below, as the secondary notation. 
DerFiniTIon. The entries a, b, --- , d of a symmetric mean (ab - -- d) of degree 

m form a partition of the integer m. It will be convenient to represent such a 


partition in terms of m distinct symbols, so that the secondary notation for 
(ab --- d) will be 


(3) (Qi2 *** Gay TiT2 *** Thy *** 5 8182 °° Sa), 


where commas are used to separate the parts of the partition, and the lengths 
of the parts are the positive integers a, b, --- , d, whose sum is m. Any use of the 
word partition below will refer to an expression such as that enclosed in ( )’s in 
(3). Two partitions are equivalent (not distinct) if they are identical, except 
possibly for the order of parts and the order of symbols within a part. Greek 
letters will be used to represent arbitrary partitions. A partition 6 is a subparti- 
tion of a partition a if a can be made equivalent to 6 merely by the insertion of 
one or more commas. A dichotomy of a partition a is an ordered set {a , az} of 
two partitions, a and a: , such that a consists of some of the symbols comprising 
a, and a, of the remaining ones, and such that any two symbols which both 
occur in a or both in a2 belong there to the same part if and only if they belonged 
to the same part of a. The null partition will be denoted by ¢, so that {¢, a} 
and {a, ¢} are dichotomies of a. A simple dichotomy of a into {a , a2} has the 
property that each part of a belongs entirely to a; or to a2 . The join of partitions 
a, and az , having no symbols in common, is that partition a such that {a , ae} 
and {a2 , a} are simple dichotomies of a. An expression such as (a), with brackets 
enclosing a Greek letter, will denote a symmetric mean, not with just one entry, 
but with entries which are the lengths of the parts of the partition a. The sym- 
metric mean (¢) is defined to be 1. 
TueoreM 1. The pairing formula for a symmetric mean (a) is 


ave aver (a) = > (8) (y)’, 


where the summation extends over all distinct dichotomies {8, y} of a. 
Proor. We recall that (a) is a symmetric mean for a sample of numbers of the 
form x; + y;. Hence if (a) is of the form (3), we have 


(a) = 7 >” [(xi + ys) «= + (ee + yay + yd +++ Cy + YI 


++ [Cae + yx) «++ (ae + wed), 


where there are a, b, and d equal factors within the first, second, and last pair 
of square brackets, respectively. For a fixed choice of 7, 7, --- , k, the product 
following the >-* symbol expands to a sum of 2**°*"*** terms, each of the form 


(4) Ai -s% Peeve 


where 


A_B D 
Bid<<k mm Didy °°* By, 





SYMMETRIC FUNCTIONS 


and 


Yiswera = Yl yf es yh. 
Each term of the form (4) must be summed over the allowable sets of values of 
t, j, °** , k, averaged over randomization, and divided by M; aver (a) is the 
sum of these individual results, one for each split of a,b, --- ,dinto A, B,---,D 


and a — A, b — B, --- ,d — D. From the independence of the two randomiza- 
tions, we have 


] . ‘i 1 ’ 
(5) UM aver >” X i,j,eee,k } ijk = VW Ze aver X i,j,000k aver Fepigi 


But aver X,,;,..... is simply 


(qi *** Qa,M se Tp, eee, 8 eee 8c)* — (B)*, 


where (8) is a symmetric mean of the type mentioned in the statement of the 
theorem, the asterisk indicating that it at present refers only to the sample of 
zx’s in question. Similarly, aver Y;,;,... is (y)**, y and 6 being related as in the 
statement of the theorem. Hence 


1 
aver (a) iW D> d™ aver Xi.j.4 aver Yi,j.00 


= 7 LL” By). 
The M terms in the >-* summation being equivalent, this reduces to 


aver (a) = >.(B)* (y)**, 


this last summation being as defined in the statement of the theorem. The final 
step is to average over samples. Since the samples are chosen independently 
from different populations, and since the symmetric means are inherited on the 
average, we have the theorem, namely, 


ave aver (a) = ).(g) (y)”. 


DeFINITION. For partitions of a fixed number, m, of symbols, we say that 
rank a < rank B 


if (a) the number of parts in a exceeds the number of parts in 7, or if (b) a and 6 
have the same number of parts; but when the parts are arranged in order of 
increasing length, the first i — 1 parts of @ are equal in length to their corre- 
sponding parts in 8, while the ith part of a is shorter than the ith part of 8. 

Our definition of polykays will be in terms of the secondary notation, a polykay 
being represented by (a) and distinguishable from a symmetric mean (in this 
notation) only by the use of parentheses in place of ( )’s. 

DerritTi0on. The polykays of degree m are defined by the equations 


(6) (a) = (a) + (Ba), 





60 ROBERT HOOKE 


where there is one equation for each symmetric mean (a) of degree m, and where 
the summation is over all distinct subpartitions 8, of a. [If there are S(m) sym- 
metric means of degree m, the S(m) equations (6) of course define the S(m) 
polykays that occur on the right if and only if the determinant of the coefficients 
of the distinct polykays does not vanish. (Two polykays, or two symmetric 
means, are equivalent, or not distinct, if the partitions representing them can 
be made equivalent by renaming the symbols.) Since, in each equation of (6), 
the rank of (a) is greater than that of any of the (8.), then when those (8,) 
which may be equivalent are collected and results are ordered by descending rank, 
the determinant has ones down the main diagonal and zeros below, so that its 
value is 1.] 

Since any symmetric polynomial function can be expressed as a linear com- 
bination of symmetric means, it follows from the definition just given that it 
can also be expressed as a linear combination of polykays. 

EXAMPLE (m = 3). The symmetric means are (111), (12), and (3), expressed 
in the primary notation, or (p, g, 8), (p,q 8), and (p q 8) in the secondary notation, 
in order of ascending rank. The polykays are then defined by the equations 


(P, 4, 8) = (P, 4, 8), 
(p,98) = (p,q8) + (p, 9, 8), 
(pq 8) (pqs) + (p,q8) + (g,p8) + (8, pq) + (Pp, 4, 8). 
These may be solved to give 
(p, 9, 8) = (Pp, 9, 8); 
(p,q8) = (p,q 8) — (Pp, 4, 8); 
(pqs) = (pqs) — (p,q8) — (g, ps) — (8, pg) + 2D, g, 8), 
or, in the primary notation, 
kw = (111), 
kis (12) — (111), 
ke = (3) — 3(12) + 2(111). 
THEOREM 2. The pairing formula for a polykay (a) is 
ave aver (a) = >,(8)’ (y)”, 


where the summation extends now (in contrast with the similarly written summation 
of Theorem 1) only over the distinct simple dichotomies {8, y} of the partition a. 

Proor. We obtain the result by induction on rank for a fixed degree m. For the 
lowest rank, i.e., for (11 --- 1) and ky... (in the primary notation), the sym- 
metric mean and polykay are identical; since all dichotomies in this case are 
simple, Theorem 2 holds for this rank by virtue of Theorem 1. For other ranks, 
we observe in equation (6) that 


(7) ave aver (a) = ave aver (a) + 2b ave aver (8,) 





SYMMETRIC FUNCTIONS 61 


and recall that the rank of each of the 8, is less than that of a. The induction 
assumption then will be that the theorem has been proved for the (8,). Apply- 
ing the theorem to any particular ave aver (8,) gives us the sum of 


(y)’(6)” 
over all distinct simple dichotomies {y, 5} of 8. . None of these dichotomies can 
arise from any of the other 8., since any simple dichotomy determines the par- 


tition from which it comes. Hence, since the various 8, are all the distinct sub- 
partitions of a, it follows that 


du ave aver (84) = >: (7)'(8)”, 


where the last sum extends over all > (y)’(6)” such that {y, 5} is a simple 

dichotomy of some subpartition of a. This is the same as saying that the sum 

extends over all (y)’(6)” such that the join of y and 4 is a subpartition of a. 
Going to the left side of (7), we have, from Theorem 1, 


ave aver (a) = >. (A)’'(u)”, 


where the sum extends over all distinct dichotomies {\, u} of a. Each (A)’ and 
(u)” can be expressed in terms of polykays by equations (6). Since two \’s 
arising from distinct dichotomies of a cannot contain the same symbols, 
> (A)’(u)” must be equal to 


> (&)'(n)” 


where this sum extends over all terms where £ and 7 are subpartitions of some 
\ and w (or — = A or n = uw or both), respectively, {A, uw} being a dichotomy of 
a. This is evidently the same as the sum of all (€)’(n)” such that the join of 
and 7 is a or a subpartition of a. 

The first and third of the three terms of (6) have now been specified, and 


ave aver (a) is equal to their difference, which is the sum of all ()’(n)” such that 
the join of ~ and 7 is a. 


ExamP_Le. Consider the polykay ky, or (p, g s). The simple dichotomies of 
Pp, 7 8 are 


p,qs and ¢, 
p and qs, 
qs and p, 


@ and p,qs, 
so that 
ave aver ky. = ave aver (p, q 8) 
= (p, q 8)'(¢)” + (p)’(q 8)” + (¢ 8)'(p)” + @)'(p, g 8)” 
kia + hike + kek + kis, 
(¢) being 1. 





62 ROBERT HOOKE 


4. Bisamples and generalized symmetric means. We turn now to the problem 
of the present paper. We suppose a population matrix 


\|zru\|, I= 1,2,---,R; J = 1,2,---,C 


from which a bisample 
\|zss\l, $=1,2,-+-,r;5j = 1,2,-:-,¢€ 


is selected as described in Section 2. Any polynomial symmetric in the 2,; (in 
the sense defined in section 2) is a linear combination of sums of the type 


Lo 23g ++ zit, 
where the symbol >, for two-way arrays, will mean summation over all subse- 
quent subscripts, with the restriction that row subscripts represented by differ- 
ent letters must remain different throughout the summation, and the same for 
column subscripts. 


We define generalized symmetric means to be averages of monomial functions 
over a matrix; i.e., a g.s.m. is a polynomial 


1 a. Gee 
(8) = ( >” a --- 2%"), 
Ie  oa---.08 


where M is the number of terms in the summation. A g.s.m. is specified by the 
exponents, together with information which tells which ones correspond to 
elements that lie in the same row, and which ones correspond to elements that 
lie in the same column. A convenient notation for g.s.m.’s is thus provided by 
placing the exponents in a matrix within brackets in such a way that exponents 
which affect elements in the same row of the matrix |/z,;|| are entered in the same 
row, and similarly for columns. Thus,’ 


.e-8e.. 1 a or 
ls 0 1 = ES DES eS HL Mute. 

Ordinarily the zeros will be replaced by dashes. Dashes will also be used to ex- 
tend every matrix of entries to at least two rows and two columns to avoid con- 
fusion with symmetric means and, when parentheses are later introduced, to 
avoid confusion with binomial coefficients. Thus, 


$ -lJtews 
? i]-52r 
rn “| 7 _ ij Thi « 


2 - re(r — 1) 
Evidently two g.s.m.’s are identical if the matrix of entries of one can be ob- 
tained from that of the other by permuting rows and/or columns. The distinct 
g.s.m.’s of degrees 1 and 2 are as follows: 


3 Square brackets are used, for convenience in printing, in place of ()’s for g.s.m.’s hav- 
ing more than one row. 





SYMMETRIC FUNCTIONS 


Degree 1: s > 
meee TE IE IE J 


The idea of random pairing for bisamples is a straightforward extension of 
that described in Section 3 for samples: Given two r X c matrices ||z,;\| and 
yi;\|, the order of rows and of columns is randomized in each, and a new r X c 
matrix ||z,;;|| is formed by matrix addition of the results. 
The general term 


ase coe gest 

(which specifies, as in (8), a g.s.m. of degree m) contains m factors, a,, of which 
are equal to z,, , etc. To each of these factors we assign a different symbol, and 
the resulting set of symbols may be partitioned in two ways—once by rows, and 
once by columns. The secondary notation for the g.s.m. will then be an ordered 
pair 


(a/8) 


of partitions a and 8, each on the same set of symbols. Each part of a will con- 
sist of those symbols which correspond to factors having a particular row sub- 
script, and the parts of 6 are similarly determined by column subscripts. For 
example, 


b, be 1 — 
- | ~ rer — De — I) LL” Zpet pee 


becomes, in the secondary notation, 


(abd,e/ab,de). 


To establish the property of inheritance on the average, let 
1 a ag 
(a/B)’ = U 2 xP? eee «sy 


represent any g.s.m. for an R X C population. This is the average, with equal 

weights, of all terms B = 2x}%° --- xs+". If (a/8) represents the same g.s.m. for 

an r X c bisample, one or more of the expansions of (a/8) over various bisamples 

will contain any given term B. Hence ave (a/8) is a weighted average of all 

terms B, and it follows from the symmetry of the set of all r X c¢ bisamples 

that the weights in this average are also equal, so that ave (a/8) = (a/B)’. 
THEOREM 3. For any g.s.m. (a/8), the pairing formula is 


ave aver (a/8) = >> (y/8)’(A/n)”, 


where the summation extends over all distinct dichotomies |v, \} of a and {6, u} of 
8, y and 6 consisting of the same symbols. 





64 ROBERT HOOKE 


The proof of this theorem, being virtually identical with that of Theorem 1, 
will be omitted. 


5. Definition of the bipolykays. In order to make the general definition of 
bipolykays, we define a “‘dot-multiplication” for symmetric means as follows: 


(a)-(8) = {a/8) if a and 8 consist of the same symbols 
= 0 otherwise. 


This noncommutative multiplication can be extended by distributivity to pro- 
vide dot-products of linear combinations of symmetric means. 

Deriniti0on. The bipolykay (a/8), where a and 8 are partitions of the same set 
of symbols, is 


(a/8) = (a)-(8), 


it being understood that (a) and (8) are expressed as sums of symmetric means 
(as in the example just before Theorem 2, Section 3) before the dot-product is 
taken. 


EXAMPLE. Consider the bipolykay ( * ) (The primary notation for a bi- 


polykay (a/8) is the same as that for the g.s.m. (a/8), with ( )’s replaced by paren- 
theses.) This becomes, in the secondary notation, 


(7 ; (p q,8/p,q 8) 


(p q, 8)-(p, q 8) by the definition above 
[(s, Pp q) ee (8, P; q))-[{p, q 8) pens (p, q s)] 
by the example preceding Theorem 2 


= (s,p q/p,q 8) — (8, p,9/7,9 8) — (8,79 q/P, 4 8) + (8, D.9/P,%8 


11 ae - ot 
~()-[]- pad ft 
--1 
Since bipolykays are linear combinations (with constant coefficients) of the 
g.s.m.’s, the bipolykays must also be inherited on the average. By means of 
the device of ranking (as was done for polykays in Section 3), one can show 
that the g.s.m.’s can in turn be expressed as linear combinations of bipolykays. 
(This is done explicitly through degree 4 in Section 8.) Hence any polynomial 
symmetric function of elements of a bisample can be expressed as a linear com- 
bination of bipolykays. 


6. Pairing formulas for bipolykays. The statement of pairing formulas for 
bipolykays requires the following terminology: 
Derinition. The bipolykay (a/8) is said to be decomposable if there exist simple 





SYMMETRIC FUNCTIONS 65 


dichotomies {a , a2} and {8;, Bs} of a and 8, respectively, such that a and 
consist of the same symbols, and neither a; nor ae is null. In this case, (a/8) 
may be written as a product 


(a, B) _ (ay ’ B1) x (ae ’ Be), 


where the commutative operation denoted by X is defined by this equation, 
and (a, 6:1) and (a2, B:) are called components of (a/8). If any component is 
similarly decomposable, the original bipolykay can be written as the X-product 
of at least three components, and clearly any decomposable bipolykay can 
finally be written as the X-product of indecomposable components where the 
set of indecomposable components is unique except for order. 


TuHEoreM 4. If a bipolykay (a/8) is indecomposable, its pairing formula is 
simply 


ave aver (a/8) = (a/8)' + (a/p)’. 


If (a, 8) is decomposable and is the X-product of indecomposable components 
(a;/8;), 7 = 1, 2, --+ , d, the pairing formula is 


ave aver (a/8) = (a/8)’ + (a/8)” + > (v/8)’(/u)’, 


where the summation extends over all expressions for which (y/6) is the X-preduct 
of 1, 2,---, ord — 1 of the (a;/B;) and (A/p) is the X-product of the remaining 
ones. 

EXAMPLES: 


11 
ave aver| |) = ave aver (pq,8/ p,q 8) 


= (pq,s/p,qs) + (pq, s/p,qs)” since this is inde- 
composable 


11/ 11\" 
_ (7 ) + (? 4 
ll- 
ave aver (? Po ‘ ave aver (p q, 8 / P, q, 8) 


ave aver [(p q/ p, q) X (s/s)] 
(pq,8/ p,q, 8)’ + (pq, 8/ p,q, 8)” 
+ (pq/p, q)'(8/s)” + (s/s)'(p a / p, 9g)” 


(yey +CV CY 
CD (LY. 





66 ROBERT HOOKE 


(Note: A decomposable bipolykay in primary notation can easily be recognized, 
as its matrix of entries can be put in the form 


A!iB 


¢ | D 


where A, B, C, D are matrices, with all elements of B and C zero.) 

The remainder of this section will be given over to the proof of Theorem 4. 
As before, asterisks will indicate bipolykays (or g.s.m.’s) for bisamples, and 
primes will denote population values; if a certain population is indicated by n 
primes, a bisample from that population will be indicated by n asterisks. 


DEFINITION (EXTENDING THE DoT-MULTIPLICATION). In dealing with two 
different populations, we define 


[(a)*(8)**] -[(v)*(5)**] = [(ax)- Cr) ]*1(8)- (8)]**, 
and by extension this provides a meaning for any expression which is formally 
written as a dot product of linear combinations of terms of the type (a)*(8)**. 
Asterisks may be replaced by primes. (Note: Since we are dealing with matrices, 
the terms (a)*(8)** themselves have no meaning.) 

Lemma 1. ave aver [(a)-(8)] = [ave aver (a)]-[ave aver (8)], and ave aver 
[(a)-(8)] = [ave aver (a)]-[ave aver (8)]. (Here (a)-(8) = (a/8) is a g.s.m. for 
a sample formed by random pairing of two bisamples. The expressions ave aver 
(a) and ave aver (8) are formal expressions of Theorem 1, their dot product 
having meaning only from the definition just above. Similarly for the polykays, 
which must be expressed as sums of g.s.m.’s before the above definition gives 
them meaning.) 

Proor. If a and 6 do not consist of the same symbols, the result is trivial. 
If they do, then (a)-(8) = (a/8), and so 


ave aver [(a)-/8)] = z (y/5)'(A/u)”, 


where y, 5, A, uw are as described in Theorem 3. Clearly, from Theorem 1, 
[ave aver (a)]-[ave aver (8)] gives the same sum, so we may now go to the second 
part of this lemma. In the case of polykays, we have, by their definition, 


(a) = Di: aa), 
(8) = 204 6x8)), 
where a; is a or a subpartition of a, and the 8; bear the same relation to 8. Hence 
ave aver [(a)-(8)] = >>; >-; asb; ave aver [(a;)-(8;)], 
and 


[ave aver (a)]-[ave aver (8)] > > a,b; [ave aver (a;)]-[ave aver (8;)]. 





SYMMETRIC FUNCTIONS 67 


By the first part of the lemma these are the same, and so the second part is 
proved. 

LemMa 2. (a)’(8)”-(y)’(6)” = [(a)-(y)]'[(8)-(6)]”. (Each side of this equation 
has the meaning that is provided by the above conventions after each polykay has 
been written as a linear combination of symmetric means.) 


Proor. As in Lemma 1, we write 
(a)’ > a;<a;)’, 
(8)” = Di b;(B;)”, 
(vy) = >: CeYe)’; 


(6)” pyr we 


Then 
(a)'(B)” -(y)'(6)” = 4 3 a,b ;{a;)'(B;)” > 7. Ce Am Vx)’ bm)” 
= wis ys > Ras Ab Cx Am(ax;)’(B5)” * ye)’ Em)” 
by definition 
= Lis Dis adbsl(as)- (ve)! Lee Lem [(B5)- Om)” 
= [(a)- (IIB) -(6))”, 


proving Lemma 2. 
To prove Theorem 4, we write 


ave aver (a/8) = ave aver [(a)-(8)] by definition 
= [ave aver (a)]-[ave aver (8)] by Lemma 1 
a 7 (Na) (ua)” * Zz (As)’ (ug)” by Theorem 2, 


where the first sum extends over all simple dichotomies {\. , ua} of a, and simi- 
larly for the second sum. Hence 


ave aver (a/8) = >> > [(ra)’* Ag)’ I (uea)” * (up)”], 


by Lemma 2. Now \, is a partition consisting of some of the parts of a, with no 
other changes made, since |\, , ua} is a simple dichotomy of a. Similarly \g con- 
sists of some of the parts of 8. The expression (A,)’-(Ag)’ vanishes unless A, and 
Ag comprise exactly the same symbols. Thus the only nonvanishing terms arise 
when 

(a) ra a and Ag = 8B, producing the term 


[(Aa)’ (As)IL@)” - @)”} = (a/B)’, 


¢ being null; or 
(b) Xv = Ag = ¢, producing the term (a/8)”; or 
(c) Xa = Ag and wo = ws and none of these is null. 





68 ROBERT HOOKE 


The last case cannot happen to an indecomposable bipolykay. If the bipolykay 
is decomposable, case (c) gives exactly the various terms that correspond to the 
splitting of the bipolykay into indecomposable components, and Theorem 4 is 
established. 


7. Pairing formulas for certain special cases. Various special cases and de- 
generate cases arise in connection with pairing when applied to the analysis of 
variance. In order to deal with some of these, we need first a lemma 
about polykays and then a theorem: 

LemMA 3. If kmn...p is any polykay, the coefficients in its expression as a linear 
combination of symmetric means add to 0 unlessm = n = --- = p = 1. 

Proor. Consider a population, or sample, all of whose elements are equal to 1. 
Then clearly every symmetric mean has the value 1, and the value of k,.,...» is 
the sum of the coefficients in its expression as a linear combination of symmetric 
means. We have only to show that in this case k,,....) = 0 unless 


m= = eee = p=, 


Looking at equation (6), we see that, when all parts of a are of length 1, 
(a) = (a), i.e., 
ky... = (11 +--+ 1) 


, 


and, in the present case, each of these equals 1. If a has one part of length 2, 
and all others are of length 1, then 


(11 --- 12) = (11 --- 12) + (11 --- 1); 


or 1 = (11 --- 12) + 1 in the present case, so that (11 --- 12) is 0. We can now 
prove the theorem by induction on rank, supposing that, for a given equation 
of type (6), 


(a) = (a) + > (6.), 


all polykays of rank less than that of (a) are 0 except for (11 --- 1). This equa- 
tion then becomes 


1 = (a) + 1, 


and (a) = 0. 


THEOREM 5. Consider a bisample in which all elements in the same row are 
equal, 1.€., 


Lig = 2; 


Over this matrix, a bipolykay (a/8) (a) is equal to the polykay (a), defined over the 
set of x; , if all parts of the partition 8 are of length 1 (i.e., if, in the primary nota- 
tion, all entries are ones in different columns); or (b) ts equal to 0 otherwise. 

Obviously an analogous statement applies to a bisample with constant columns. 





SYMMETRIC FUNCTIONS 69 


Proor. In this case it is obvious that any g.s.m. (a/8) is equal to (a), the latter 
being defined over the set of x; . Now if (a/8) is any polykay, we can write 


(a/8) = (a)-(8) 


>: aa; > 5 b;(B;) by definition 
_ > rs ab ;(a;/B;) 


= >; 5 ada;), by remark at beginning of this paragraph. 


This last expression vanishes unless >> b; ~ 0, i.e., unless (Lemma 3) all parts 
of the partition 8 are of length 1. Hence Zz b; is 1 when it is not 0, and in this 
case 


(a/B) = 


The special cases which we now wish to consider are as follows: 
Case I (Constant Rows). Theorem 5 shows that in this case a bipolykay 
(a/B) is given by 


(a/B) = kag.» if all parts of 8 are of length 1 
= (0 otherwise, 


m,n, +++, p being the lengths of the parts of a. 
Case II (Constant Co.tumns). Here, of course, 


(a/B) = kaa:--9 if all parts of @ are of length 1 
0 otherwise. 


Case III (Constant Rows anp Couumns). Here all elements of the bisample 
are equal. It follows that 


(a/B) = ky... if all parts of a and 6 have length 1 
= ( otherwise. 


If d is the common value of the elements of the bisample, and m is the degree of 
(a/B), then (a/8) = d” when it does not vanish. 

These cases might arise, for example, in connection with a linear model such 
as 


Uj = Mt Ut Yi t 235, 


where the z; can be thought of as a bisample from a matrix with constant rows, 
the y; from a matrix with constant columns, m from a matrix with all elements 
equal, and z,; from an arbitrary matrix representing ‘‘cell effects”. Using 0, 1, 2, 
3, and 4 primes (asterisks) for the populations (bisamples) respectively associ- 





70 ROBERT HOOKE 


ated with u, m, x, y, and z, we find the pairing formula for any indecomposable 
bipolykay (a/8), for example, to be 


ave aver (a/8) = (a/8)’ + (a/B)” + (a/B)'" + (a/B)”"", 


and the cases above would tell us that some of these terms are zero and others 
are equivalent to polykays, depending on what bipolykay (a/8) represents. 

Instead of sampling from a matrix, one may wish to consider a degenerate 
case in which the population consists only of rows, with no column designations; 
i.e., an r X ¢ bisample is chosen by a selection of r rows followed by a selection 
of c elements from each of the r rows chosen. There is also the completely de- 
generate case, with no rows or columns, so that an r X c bisample is just an 
ordinary sample, randomly arranged, of re elements from a set of numbers. 
This case, which would apply, for example, to the z’s in the linear model above 
if they were regarded as independently sampled “random errors’ instead of 
fixed interactions, is designated as Case IV: 


Case IV. When pairing a bisample with a completely degenerate bisample, 
we have only to notice that randomization in the latter is not restricted to rows 
and columns, so that we have, for the completely degenerate case, these results: 

(a) All g.s.m.’s with the same entries (primary notation) have equal aver- 
ages for randomization, e.g 


5*) 


> 3- 8 
aver; _ on 


(b) All bipolykays vanish on the average except those having only diagonal 
elements (in secondary notation, this means those (a/8) such that @ and £ are 
equivalent partitions). This statement can be verified for degrees $4 by ob- 
serving that, in the relevant conversion formulas (Section 8), the coefficients of 
g.s.m.’s with the same entries add to zero. 

(c) Bipolykays with only diagonal entries are equal, on the average, to the 
corresponding polykays; e.g., 


) 7 
aver ( ; ) = aver ko, etc. 


8. Conversion formulas for g.s.m.’s and bipolykays. In this and the next 
section, tables will be presented which make possible the use of bipolykays up 
through degree 4, that is, up through variances of variances. The distinct g.s.m.’s 
of degrees 1 and 2 were listed in Section 4. Those of degrees 3 and 4 require more 
space (in either notation) and so will be denoted by ?’s (for degree 3) and f’s 
(for degree 4) with subscripts, even though this notation is less informative, 
as follows: 


Degree 3: ts = } 





D 
Zz 
© 
= 
& 
_ 
Z 
DP 
m 
oO 
= 
= 
& 
5 
ie 
S 
= 
be 
N 





ROBERT HOOKE 


foo = 


fu _ 


fu fa =|° ‘| 


The following conversion formulas apply to the bipolykays of degrees 1 
and 2: 


i 

k ‘| 4 
nk 
ae 


Degree 2: - 


co 
e% Pi] 
(1) * i] 

‘oh CY y-(]+E. 


The bipolykays of degree 2 have been independently developed by H. Fair- 
field Smith in [5]. 


For degrees 3 and 4 we use notation analogous to that used above for g.s.m.’s, 
letting T’s stand for bipolykays of degree 3 and F’s for bipolykays of degree 4. 


Thus 
ss 22 
T's = ( ) 


F, = ( ; : a, etc. 


The conversion formulas for bipolykays of degrees 3 and 4 become quite 
long; but since they are linear, only the coefficients are of interest. These co- 
efficients are found in Table 1 for degree 3 and in Table 2 for degree 4. The 
nature of the formulas makes it possible for one table to present coefficients for 


~ ] 
~ l 
2 


conversion in both directions. The coefficients for any desired expression are 
found by reading over (or down) to and including the diagonal of ones. For 
example, in Table 1, 





SYMMETRIC FUNCTIONS 


TABLE 1 


For conversion of g.s.m.’s and bipolykays of degree 3 





by ts 


a 


~~ . ts 


Sn 
— 





T,+ 37;+ 7s, 
and similarly in Table 2. 


9. Multiplication formulas for bipolykays. The usefulness of the property of 
inheritance on the average is pretty well limited to the case where functions 
having this property occur linearly. Any polynomial in bipolykays for one bi- 
sample, however, can be expressed as a linear combination of bipolykays for that 
bisample, given the proper multiplication formulas. We give below the multipli- 
cation formulas for bipolykays, up to and including products of degree 4. 


CJC) eC )et)+aC) 
-)( 1 


TC 


27; + 276 + 2rT; + 2cT2 + reTi, 


, + 2cT. + rT; + rceT;, 


c 
(- 
(1 
* 
(: 


Tio + rT, + cTs + reT; 


TC 


(’ ‘) 2T; + 2rT. + cT, + reT2, 
T, = 6F yw + 3F 2 + 3rF3 + 3cF2 + rceFi, 


) 
J 
| eg: 
) 
6 


T2 2F 3 F415 a F4s + 2F x» oa r(2F a Fy) od c(F, — Fs) oo rcF >, 





: 


| 


oL— \9t ete 
SI |ZI—-| 2 
If 


IN DO =o 


IND ONAN 9 BH OD OD XH Ht 
COON tT OD 


~ 


1 


a“ 
a 
° 
© 
= 
be 
oa) 
Q 
= 
© 
foo) 


HHA HOONNANANSOO 


oD =H 


mt OOD 


t aasbap fo shivyfjodyg pun s,"w's'6 fo worssaauod 40 4 


6 WTA VL 





SYMMETRIC FUNCTIONS 75 
“)r 3 = 2Fut Fie t+ Fi + 2Fu + (Fs + Fr) + c(2Fi0 + Fu) + reFs, 
7 T, = 3Fx2 + 3rF 5 + cFs + rceFe, 


e 3F 2, + rFy + 3cFi6 + rcF;, 


= Fy + Fo + Pos + Fo + r(FPu t+ Fis) + (Fis + Fis) + reFw, 


Ts Fx + Fa + r(Fu + Fx) + cF2 + rcF x, 


= Fy + Fe + rFen + c( Fos + Fe) + reFn ’ 


. Fu + Fas + Far + Fos + r(Fp + Fu) + (Fu + Fu) + reFx, 


a Ty = Fy + re + cFu + refs. 


Products of bipolykays of degree 2 are more complicated. The coefficients 
of the bipolykays of degree 4 in the expressions of these products are tabulated 
below, using the following abbreviations: 


=2/[(re(e-1)) g= 
2/(re(r — 1)] h=1/r 
2/[e(e — 1)] k = 1/(re) 
2/[r(r — 1)] p = 1/[re(r — 1)(e — 1)] 


10. Variances of bipolykays of degree 2. The multiplication table of the pre- 
ceding section enables us to find the variances, in taking bisamples from a popu- 
lation matrix, of bipolykays of degree 1 or 2. For example, we have 


1 -\* (~i-\\? 1 -\*\" 
var ( ) = ave aver {( ) > — ave aver (7 yy. 
a Rae eo ‘ 


\ 
From Section 9 we have 


(t=) =? =) +e(t #) 4 a 4) 


\2 


; ein 
ave aver (1 ) \ = ave aver{ 4 (? “)" + 
\ a J ire 
= uty 
7. a. = 


and so 





q- 


62 +-942) 



































= 
bd 
° 
° 
x 
& 
oS 
= 
= 
° 
= 


0 | 


| 
0 | 0 











oz | 9% | i] 1b | ¥8 


wy oy ty “7 Wy ty | ty Wy oy 









































g°aaubap fo shvyfijod1q fo s79NpOsq 
€ WTEAVL 





SYMMETRIC FUNCTIONS 


and 
( aa 
ce aver C in ( 
1/2-y 1/f11Y .1/f1-Y 1 -\ 
2 Rol. ") i a” ‘) + ay -) + :) ; 
Hence 


oar! f= fh— Be alt «SY 


Proceeding in the same way for the variances of bipolykays of degree 2, we 
obtain the results given in Table 4, which table provides the coefficients of the 
expressions, of the indicated variances and covariances as linear combinations 
of population bipolykays. In order to simplify the tabulation, the following 
expressions are used: 


1 
A — a la I 


ec ke 


2 2 
@—-1e—-1 (M—1)C —}) 
ee NL 
reir — 1) RC(R—1) 

2 


B 





sail sis eal 
rele— 1) RC(C — 1) 
2 a 2 
c(r — 1)(e — 1) C(R — 1)(C — 1) 
2 2 
rr—il1)(e—-1) RR—1(C- 1) 





9 


9 
~ CR —D re(r — Ife - 1) RO(R—1(C— 1) 
2 
zi R(C — 1) 

11. Conclusion. Any symmetric function of elements of a two-way array can 
be expressed as a linear combination of bipolykays, using the multiplication 
formulas of Section 9 where necessary. If there is a linear model involved, then 
the average values of the bipolykays (and hence of the original symmetric func- 
tion) can be found, by means of pairing formulas, in terms of polykays or bi- 
polykays of the populations from which come the components of the linear 
model. A later paper will illustrate the use of these procedures in finding un- 
biased estimators for variance components, as well as the variances of these 
estimators, etc. 





02+ V2 


| d@t+az 


L+d+d 


ROBERT HOOKE 


A+) Oz+VF 








"y ny ty 


‘ 


| 
| dt+adt 


fO 822UD140009 PUD saIUDLLD A 


¥ATAVL 





SYMMETRIC FUNCTIONS 


REFERENCES 

. A. Fisuer, ‘Moments and product moments of sampling distributions,’ Proc. London 
Math. Soc. (2), Vol. 30 (1928-29), pp. 199-238. 

t. Hooke, ‘Sampling from a matrix, with applications to the theory of testing,’ Statis- 
tical Research Group, Princeton University, Memorandum Report 53, Nov., 
1953. 

. Hooxe, ‘Moments of moments in matrix sampling—an extension of polykays,”’ 
Statistical Research Group, Princeton University, Memorandum Report 55, 
April, 1954. 

. Hooxg, ‘‘The estimation of polykays in the analysis of variance,’’ Statistical Re- 
search Group, Princeton University, Memorandum Report 56, May, 1954. 

. FarrFIeELD Smita, ‘‘Variance components, finite populations, and experimental in- 
ference,’’ University of North Carolina, Institute of Statistics Mimeo Series No. 
135, July, 1955, p. 57. 

). Tuxey, ‘Some sampling simplified,’’ J. Amer. Stat. Assn., Vol. 45 (1950), pp. 
501-519. 

/. Tuxey, “Finite sampling simplified,’”’ Statistical Research Group, Princeton 
University, Memorandum Report 45, March,1951. 

/. Tuxey, ‘Keeping moment-like sampling computations simple,’”’ Ann. Math. 
Stat., this issue. 

J. Wisuart, “Moment coefficients of the k-statistics in samples from a finite popula- 

tion,’’ Biometrika, Vol. 39 (1952), pp. 1-13. 





SOME APPLICATIONS OF BIPOLYKAYS TO THE ESTIMATION OF 
VARIANCE COMPONENTS AND THEIR MOMENTS! 


By Rospert Hooker? 
Princeton University 


1. Summary. Bipolykays were introduced in [3]. They form a family of sym- 
metric (row-wise and column-wise) polynomial functions of the elements of a 
two-way array, with the property of being inherited on the average, and such 
that any similarly symmetric polynomial function of the same numbers can be 
written linearly in terms of the bipolykays. This paper will describe some ap- 
plications of bipolykays to problems in the analysis of variance of two-way 
classifications, using the formulas and tables derived in [3]. A linear model which 
includes contributions from interaction as well as independently sampled cell 
contributions is given in Section 3, and applications are made to certain cases 
of this model. These applications include (a) finding unbiased estimators for the 
variance components in the case of no interaction as well as unbiased estimators 
for the variances of these estimators (Section 6), (b) finding expressions for 
means and variances of some of the functions of degrees 1 and 2 that are of 
interest in the problem of sampling from a matrix (Section 7), and (c) finding 
unbiased estimators for variance components in the general case, including ex- 
pressions for the variances of these estimators in the case of infinite populations 
(Section 8). 


2. Introduction. The purpose of this paper is to describe some uses of bi- 
polykays, which were defined in [3], in connection with problems arising in the 
analysis of variance. A linear model is given in Section 3 and an analysis of vari- 
ance notation, in Section 4. Sections 6, 7, and 8 are given over to derivation of 
results related to the estimation of variance components in various special cases 
of the linear model. 

It will be necessary to make frequent references to [3]. However, in order to 
enable the reader to get the gist of the present paper without referring to [3], 
the remainder of this section is devoted to a summary of definitions and fre- 
quently-used results. 

The symbol >>, for “distinct sum,” means a sum taken over all subsequent 
subscripts, but with subscripts kept different when they are indicated by different 
letters. Thus 


9 


>” a2; = %7Z2 a ToT. 
1 


Received July 28, 1954. 
1 Prepared in connection with research sponsored by the Office of Naval Research. 
2? Now with the Westinghouse Research Laboratories, East Pittsburgh, Pa. 


80 





ESTIMATION OF VARIANCE COMPONENTS 81 


When the z’s are matrix elements, the distinctness relates to row subscripts and 
to column subscripts. Thus 


2 ‘ 
» Lig Lik = MuXe + Leu + Lae + %2In. 
1 


For a set of numbers 2, --- , 2, , the symmetric means of degrees 1 and 2 
are 


(1) = >* z;/n, 
(11) = >}>¥ 2a; / n(n — 1), 
(2) = }* zi/n. 


The polykays are linear combinations of symmetric means denoted by k’s with 
subscripts. Those of degrees 1 and 2 are defined by 


ky = (1), 
ku = (11), 
ke = (2) — (11). 


Symmetric means and polykays are inherited on the average; this means that if 
%1,°**, 2, are a sample from a population 2 , --- , zy, and if primes are used 
to denote values defined over the population, then 


/ 
ave kz = kz, etc., 


where ‘‘ave’”’ means average over all possible samples of size n. 

If 2, ---, 2, is a sample from a population P’ (with polykays ki , ki: , etc.) 
and y;, --* , ¥, isa sample from a population P” (with polykays ki, ki; , etc.), 
and if 2, --- , 2, is a sample formed by letting z; = 2; + y; (@ = 1, 2, +--+, 7), 
then 


(1) ave aver k, = ki + ki. 


Here k, (with no prime) means a polykay over the z’s, “aver’’ means average 
over all possible permutations (or randomizations) of the z’s and y’s before 
adding, and ‘‘ave” means average over samples as before. Equation (1) is known 
as a pairing formula. Pairing formulas for the polykays of degrees 2 are 


ave aver ky = ky + 2kiki+ kn, 
ave aver ky = kp + kz. 
We now let ||z,;/| be a two-way array of numbers (¢ = 1, 2,---, 7; 
j = 1, 2,---, c) which may be regarded as a bisample from an array ||z7,'! 
(I = 1,2,---,R; J = 1, 2,---,C), a bisample being chosen from a popula- 


tion matrix by taking those elements which are at the intersections of a selected 
set of r of the R rows and c of the C columns. 





82 ROBERT HOOKE 


Generalized symmetric means (g.s.m.’s) of degrees 1 and 2 over a bisample 
are 


b >” xi, / re, 
= 2 rij 2m / re(r — 1)(c — 1), 
Do” xi; te; / re(r — 1), 
2 rijtix / rele — 1), 


= >” ij / re. 


Those of degrees 3 and 4 can be expressed in similar notation, but to save space 
are indicated by t, , 4, --- , to for the 10 g.s.m.’s of degree 3 and by f; , fo, «++ , fas 
for the 33 g.s.m.’s of degree 4. An arbitrary symmetric mean may be denoted by 
(\|e||), the ||a|] representing the matrix of entries. 

The bipolykays are linear combinations of the g.s.m.’s, represented in the 


same way with parentheses replacing {_ )’s. Those of degrees 1 and 2 are de- 
fined by 


ps 


(? 
1 
1 
(? 
(? 
Those of degrees 3 and 4 are indicated by T7,, T:, --- , Tw and F,, F2, «++ , Fs, 


respectively. (See Section 8 of [3].) A general bipolykay may be indicated by 


(\|a@\!). 


Bipolykays and g.s.m.’s are inherited on the average in the sense that 


a) 
ave | _ ee » CK., 


* Square brackets are used, for convenience in printing, in place of ()’s for g.s.m.’s hav- 
ing more than one row. 





ESTIMATION OF VARIANCE COMPONENTS 


the prime and “‘ave’’ having the same meanings as for polykays above, with 


[- ) = Ee |- >.” x1, / RC, ete. 


Pairing formulas have a meaning analogous to that defined above for polykays 
and are as follows for bipolykays of degrees 1 and 2, “aver” meaning average 
over permutations of rows and of columns: 


meme Oa, 


come(t)=(Y ACCT HCY 
wvone(2)=(J40Y. 

ovo) HEY, 
vane? )a@S HCY 


In general, it is shown in [3] that the pairing formula for a bipolykay (||a\|) is 
ave aver (||a||) = (\\a|)’ + (|la||)” 

unless the matrix |\a|| can be expressed as a direct sum 

ai 0 

lal] = 

0 Om 
where the a; are matrices. If the |\a,|| cannot be further broken down, then, in 
this case, 

ave aver (\|a\|) = (\lal)’ + (\lal])” + Do (iisl})(llyll)”, 

where the sum extends over all ||8/| and ||y|| such that |/6|| is the direct sum of 
1,2, --- , or m — 1 of the |\a;|| and ||y/|| is the direct sum of the remaining ones. 


The reader is referred to [3] for the general definitions exemplified above, for 
multiplication formulas, for conversion formulas for degrees 3 and 4, etc. 


3. The linear model. Each example discussed in this paper will be based on a 
linear model which is a special case of the following: 


(2) Lie = OF mg + Es + Ass + wie. 


Here i, j, k run from 1 to r, c, b, respectively. The 6’s, n’s, ¢’s, and w’s are inde- 
pendently sampled contributions from populations described in Table 1. 

The systematic interactions \,; are not independently sampled but are ‘‘tied”’ 
to the n’s and #’s; i.e., the \’s come from an R X C matrix having a row cor- 





ROBERT HOOKE 


TABLE 1 
Notation associated with model (2) 
| 


| 
Contribution Sample Size | Population Size | Population Polykays 


we ww 


| 
6, general Arbitrary | ki , ku , ete. 
, , ; 
m , TOW R ky, ku , ete. (ki = 0) 
Ud “ ” 
&; , column ; e ki. ku , ete. (ky = 0) 


“wr 


wijr , cell N a ote. i’ o 


responding to each 7 and a column to each &, so that for each selection of an 
n: and a &; there is a unique \,; that accompanies them. Since the n’s and £’s 
represent row and column contributions, respectively, it is assumed that the 
row means and column means of the matrix of \’s all vanish; otherwise the popu- 
lation matrix of )’s is arbitrary and so must be described in terms of bipolykays 
rather than polykays. 


4. Analysis of variance notation. The matrix 
| | . . . — 
Xi5\I, 1 --eomjg=l a 
will represent a bisample from a population matrix 
lz], I= 1,2,---,R;J =1,2,---,C. 
For any matrix ||z;;||, whether it be a population matrix, a bisample from such 
a matrix, or simply a two-way array of sampled numbers such as arises in con- 


nection with some linear model, our interests will center around certain families 
of symmetric quadratic functions of the z’s. Two of these are 


(a) the bipolykays E i) ‘ 4% (; *, ? ~) and 


(b) the various sums of squares and mean squares associated with conven- 
tional analysis of variance procedures. 

The mean squares (denoted by MS) and sums of squares (denoted by SS) 
are defined as follows, where a dot represents an average over the subscript it 
replaces: 
Designation Mean Square and Sum of Squares 
Rows MSR = SSR/(r — 1) = ¢ Di (ax. — 2..)°/(r — 1) 
Columns MSC = SSC/(ce — 1) =r Dy; (2.5 — 2..)°/(e — 1) 
Residual or balance MSB = SSB/(r — 1)(e — 1) 

- Di Di (t55 — %i. — 25 + z..)/(r — 1)(e — 1) 

Mean MSM = SSM = >>, Doj2°. = rez’. 
Total SST = > > (x3 = x.) 
(As always, we have SST = SSR + SSC + SSB.) 

When it is desired to emphasize the fact that a population matrix is being 





ESTIMATION OF VARIANCE COMPONENTS 85 


discussed, subscripts will be capital letters, dashes (instead of dots) will indicate 
averages over the population, and primes will indicate population values; e.g., 


R ¢ 
S87" = > 2. tne, — 2..¥. 


I=1 J=] 


In dealing with a population matrix we shall use the following quantities: 


= (ie. 0 = >> >> 2;,/RC), 
es 

é, = 

Ars = Zip — UM- — Fy + 7K. 


The elements of the matrix can then be thought of as built up from the 7’s, 
é’s, \’s, and 6 as in model (2), taking the w’s in that model to be 0. We are then 
interested in a third family of quadratic symmetric functions, namely 

(c) the variance components, 


9 


ox = Variance component for rows 
= ¥ (ai- — z_-)?/(R - 1) 
= ks ‘ 


/ . . . 
where ky» has the meaning assigned in Table 1; 


9 


oc = variance component for columns 


p (wz; — 2 te — 1) 
= ke ; 


9 


0. = Dor Dos (ars — ty- — 2-3 + _-) / (R — 1)(C — 1). 

In this section we shall express the SS’s in terms of the bipolykays (so that 
moments of the former can be easily obtained), the bipolykays in terms of the 
M'S’s (so that values of the former can be computed by standard techniques 
used in computing the latter), and finally the MS’s for a population matrix in 
terms of the variance components (to help in expressing unbiased estimators 
for the latter). These various expressions are derived by elementary algebra, so 
only one example of the derivations will be given: 

To express SSR in terms of bipolykays, for example, we have 


SSR = C > (x. er z..) 


2 2 
=¢ >a. — ren’.. 


cli (D3 xis / c)” 
= Ds (Dials + DY zen) 


‘Sree 


Since 





86 ROBERT HOOKE 


and since 


rex’ = re(>0i.3 xi; / re) 
1 ° 
rc Dos Xij + >” Vij Lik + a Liz Ukj + >” Lij Lm) 


= Src “| + re(c — 1) E a + rce(r — 1) } | 


+ re(r — 1)(e — 1) . i}. 
we have 


SSR = (r = 1) 1? | - li -|+ (¢- 1) ({? '|- E |}. 


The equations defining the bipolykays of degrees 1 and 2 (Section 2) enable us 
to express SSR at once as a linear combination of bipolykays. The result, to- 
gether with similar ones for the other SS’s, is contained in Table 2 

The equations represented by the first four rows of Table 2 can be solved to 
produce the following inverse relationships, where it is convenient to use MS’s 
in place of SS’s: 


(3) +) = MSB, 


1 1 


(4) (MSR — MSB) / ¢ 


i - 


,. (MSC — MSB) / 


(- 
Le 
(; 
3 


= (MSM — MSR — MSC + MSB) / re. 


TABLE 2 
Coefficients for SS’s as linear combinations of bipolykays 





=") (3*) 


r—1 c(r — 1) 
c—1 0 
(r — 1)(e — 1) 0 
1 e 
re — 1 





ESTIMATION OF VARIANCE COMPONENTS 87 


Finally, it follows immediately from the definitions that, for a population 
matrix, 


MSR’ /C 
(7) 


= MSC’ /R 


Gi) +C)/* 


MSB’ 


7 (*-) 
5. Expectations in the case of no interactions. We suppose now that all the 
\’s in (2) are zero, and for the present that b = 1, so that the model is 


(10) ig = OF ac + Es + ws, 


Our first problem is to find the average values of bipolykays defined over the 
r X carray of x’s (and distinguished here by having no prime or asterisk) in terms 
of the polykays (one to four primes) of the four populations described in Table 1. 
The procedure is to apply bipolykay pairing formulas to (10). The populations 
from which come the n’s, £’s, w’s, and @ are all special cases (numbered I to IV, 
respectively, below) and will now be considered one at a time. 

Case I. The population of n’s is not a matrix population, but it can be thought 
of as a matrix whose Jth row is a vector of C components, each equal to 7, 
(I = 1, 2, --- , R). Any sample of r n’s can then be regarded as a bisample of r 
rows and c (arbitrary) columns from this matrix. Bipolykays of this bisample 
will be written (|| a ||)*, with a single asterisk. Referring to Case I of Section 7 
of [3], we see that 


(|| a ||)* = Kanes-p if the entries of (|| @ ||)* are all 1’s in different columns 


= @ otherwise, 


m,n, --- p being the row sums of the entries in || a ||, and k* denoting a polykay 
for the sample m , --- , 7» Which defined the bisample in question. The same is 
obviously true for the population. 

Case II. The remarks just made for the n’s apply to the population of ¢’s 
if we change rows to columns, single primes and asterisks to double primes and 
asterisks, etc. 

Case III. The population of w’s enters as in Case IV of Section 7 of [3]. There 





88 ROBERT HOOKE 


it was shown that (|| a ||)*** becomes, on the average (over the kind of randomiza- 
tion that is pertinent here), 


* 2” ° ' ° ° . 
Moas-<@ if m,n, --- , p are the entries of || a || and if all are in different row 
and in different columns 


0 otherwise. 


Case IV. In sampling @ we take a sample of size 1 and make it a bisample by 
putting this one number into every cell of the r X c matrix. Referring to Case III 
of Section 7 of [3], we see that 


(j] @ ||)**** = @* if all m entries of || @ || are 1’sin different rows and differ- 


ent columns 
= 0 otherwise 
Hence ave (|| a ||)**** = (m)’’” or 0, respectively. 

Keeping these facts in mind, together with the fact that ki = ky = kj” = 0, 
we can apply the pairing formulas to the bipolykays of degree 4 or less to obtain 
some useful results that are collected in Table 3 below. We first derive a few 
of these results to show how the pairing formulas are used. 

The only first-degree bipolykay is, of course, indecomposable, so its pairing 
formula (Section 2) gives us 


aveaver(?~) =(17) +(t7)'+(27)"+(42) 
= ky + ky + ey + ey” 


= ky ’ 


‘eer 


since kj = ky’ = ky’ = 0. 
The indecomposable bipolykays of degree 2 can be treated in exactly the 
same way; ¢.g. 


mel JC} HE} +E.) +E) 


sar 
== ke ; 


2 _ - . 2 — 4% . . vv? 
since the term ( is really aver , Which is k, by the remarks 


under Case III above. The other terms, i.e. those with one, two, and four primes, 
vanish in accordance with the remarks made in Cases I, II, and IV, above. 
Decomposable bipolykays lead to more complex expressions. Averaging 


e 5 for example, produces 

1 ai 1 = , 1 i ur 1 aa nr 1 a ent 
mene) =() HC) eC) #9 
+ 2(? y 


= kin + kin + hi’ + (2)"". 





ESTIMATION OF VARIANCE COMPONENTS 


As an example for degree 4, we consider the decomposable bipolykay 


-] 1] 
Fy ae 1 - 
] 


The pairing formula is 


* vk * kK * KEES 
ave aver Fy = ave aver (Fin + f ll + Fy + I ll ) 


11\"/1-Y 
+ ave aver © ( ) F ; 


where u and v represent different numbers (1, 2, 3, or 4) of asterisks. By the re- 
‘ . . 
marks made under CasesI through IV, each term of the form ave aver F;; vanishes, 
u v 
as does each term ave aver ( ) ave aver (; ) except in the case where 


u and v represent a single and double asterisk, respectively. (The assumption of 
independence in sampling provides that the average of the sum of products is 
equal to the sum of products of averages.) Hence we have 


11 * 2 is ** 
ave aver Fy, = ave aver ( ') F *) 


/ ‘7 
== ke ke 


Continuing in this way, we obtain the results shown in Table 3. 

The following, omitted from Table 3, have expectation 0: 7., Ts, Ts; Fi, 
Fi; , Fr ’ Fis , Fie » Pu » Fo, Fa » Poe , Fos , Fo » Fos, Fos , Foo , Fo, Fs , and Fx. 

The following have expectations which are complex expressions that will not 


be used in this paper: (7 a 371,72,73,77; Fi, Fe, Fs, Fe, Fr, Fu, and Fx. 


The first formula in Table 3 says that E *), the mean of the z’s, is an un- 
TABLE 3 
Bipolykays (column A) and their expected values (column B) in model (10) 








F, 
F; 
F, 
Fy 
Fy 
Fis 
Fy 
Fr 
F 3; 


| 
| 
| 








90 ROBERT HOOKE 


biased estimator of the mean of the @ population, which is obvious, since the 
other populations have zero means. The second formula (reading down) says 


that ( ) is an unbiased estimator of the component of variance for rows. 


In (4) it was pointed out that 
1 1 (arc ’ / 
( ') = (MSR — MSB) /, 


so this is the usual result. Similarly for the third formula. The fourth says that 


9 a 9 
( ) is an unbiased estimator of the ‘error variance.’’ Since ( ) is 


M SB, this is well known. Interpretation of the formulas for the F’s will be given 
in Section 6. 
The two-way model with replications, 


(11) Lin = O+ n+ Ej + win, 


where everything is as in (10) except for the greater number of w’s sampled, 
can be treated in the same way if we suppose the population of w’s to be infinite. 
First we find z;; = 2x,;. , the average in each cell. Then 


Zig =~ O+ a + & + wiz., 


and we have the sanie situation as before, with this exception: if P; represents 
the population of w’s, the w;;. come from the population P, of samples of size b 
from P, . It remains only to find the polykays of P; in terms of those of P, . 
Any «3, = >; can be thought of as a sum of b numbers from b populations 
: E+) ) RHRK(- 
all equal to P; . Hence if kee k32"*‘° represent polykays for the w;;, and w;;., 
respectively, we have, by the pairing formula for kz. , for example, 
***(+) ne sera 
ave aver ky *‘* = bkos’ + b(b — 1)ka ke 
BK,’ 
= OKee. 
-_ ° . ss 4 ° 
Finally, since kz» is of degree 4, we divide by b° to obtain 
*ee(. t 2 
ave aver kz"? = ky’’/b’. 
In similar fashion we find that 
ave aver k?**© = ke’ /b, 


***(-) aur 2 
ave aver k3 =k; /b, 


***(.) Wt 43 
ave aver k, =k, /b. 


avr sre 


It follows that for model (11), Table 3 remains as it is, except that k. , ks , 
ko’, and k,’’, wherever they appear, must be divided, respectively, by b, 0’, 
b’, and b*. For example, 


, j 
aver aver Fis = kok. /b, 


Fs being defined over the matrix ||z;;. 





ESTIMATION OF VARIANCE COMPONENTS 91 


6. Estimating variance components and their variances in the case of no 
interactions. In this section we consider applications to the case described by 
model (10) or (11). We begin with model (10), where components of variance 
for rows, columns, and “error” are k3 , kz’, and ks’, respectively, and for which 
respective unbiased estimators (Table 3) are ( ‘), (; *), and ? -), 


In order to find the variances or covariances of these estimates, one may proceed 
as in this example: 


E ‘) (: ‘ ( (: NY 
ey. aveaver(| | — { ave aver({ \ 
\ / 

ave aver(1 _ — (ks)*. 


Referring to the relevant multiplication formulas, we find that 


Je Bets S., 
(kz) Rr— 1°" + R™ 


(12) 


([5], p. 516) and that 


E q = [re(r + 1)(e — LI) Fy + c(r — Ie — LFs 


+ 4r°(¢ — 1)Fis + 2° Fy + 4r(c — 1)Fis 
+ 4(r — 1)(c — 1) Fx + 2rFx 
+ 2(r — 1)F x] / re(r — 1)(e — 1) 


by Section 9 of [3]). 
From Table 3 it follows that 


ave aver(? 7 = [re(r + 1)(c — 1)koe + e(r — 1)(e — Ika + 4r(e — 1)koks”’ 


+ 2rk29'] / re(r — 1)(c — 1): 


Hence (12) becomes 


1 1 2 ") 4 ’ 
rs = —— ——— k 9 ——__ -- keke 
var (| ) c(r — 1)(e — 1) - ‘z= a. 


9 
+(45-eA 


(13) 


In similar fashion one obtains 

1 ) 2 4 00 
'e = ——_—__—— — kee —_———— k ke ° 
var(j r(ir—})(c—1) "Ge eS? 


2 2 " ‘1 1 ” 
—————_-_ — 292 = = k : 
+(2, g 24) m+ (2 5) ; 


(14) 





ROBERT HOOKE 


2 - ae & 
7) =(e-¥)* 


2 2 \ am 
ean: Sie ieeteuneaneer ian 
Ti-— 1)(c — 1) (N-—1)) ~’ 


\ 


‘ 
(16) ; )) = 2k’ / re(r — 1)(e — 1), 


(17) : ' oT —2k.’ / c(r — 1)(e — 1), 


f ‘1 - - 91./"" 7 
(18) cov} ( he a) = —2ke / r(r — 1)(c — 1). 


Formulas (13) through (18) have, of course, been derived without the use 
of bipolykays, and are now new. Our interest here is in using them to derive 
unbiased estimators of the variances and covariances of estimated variance 
components. In the above formulas each variance is given in terms of population 
polykays kz , k22, etc. If one could observe the actual 7’s, ¢’s, and w’s that are 
sampled, the sample polykays kz, k2:", etc., would then provide unbiased esti- 
mators of the population polykays. In practice, however, one cannot do this, 
so that the formulas above do not provide unbiased estimates. 

Inspection of formulas (13) through (18) shows that one would like to have 
unbiased estimators for the following: 

kre , kes 


229 


wr tytn try eer , " ur 
koe > koko ’ ke ke ; ka ; ka 9 ka ° 


Such estimators are provided at once by Table 3, and are, respectively, the 
following bipolykays computed over the matrix of z’s: 


Fy, Fs, Fa, Fis, Pip, Fs, Fo, Fas. 


Substituting these into formula (13), we have 


Unbiased estimator for var ! ‘) od aaldaieladiamd Fez 
- e(r — 1)(e — 1) 


ii eee 2 )n+(t-5)P 
er—1) ~'\r—1 R—-1) **\r R/ *’ 
and similarly for the other formulas. 
Turning now to model (11), with replications, and again supposing the popu- 
lation of w’s to be infinite, we note that estimators for k2, ks’, and ks’ are, 


ais DP at 
respectively, E . c 3 and e ) /», these bipolykays being applied 


to the matrix ||z;;.||. Hence (13) through (18) give us the formulas for the vari- 
ances and covariances of these estimators if only we divide the entire right-hand 
sides of (15), (17), and (18) by b’, b, and b, respectively, and change (in all for- 








ESTIMATION OF VARIANCE COMPONENTS 93 


mulas) all of the three-primed quantities in the manner indicated at the end of 
Section 5. 


7. Moments in bisampling. The problem that led originally to the development 
of bipolykays (see [1]) came from this model of the educational testing process: 
Given a population of C questions (a “‘test’’) and a population of R examinees, 
suppose that the score of examinee J on question J will be z,;,. A “test form,” 
consisting of a random sample of c of the C questions, is given to each of a random 
sample of r of the R examinees. The ‘‘test score’’ of the ith examinee is fats j 
the average test score of the group is Dod tis/7, etc. One wants means and vari- 
ances of quantities such as these. 

Insofar as problems connected with bisampling can be expressed in terms 
of finding low moments of first- and second-degree symmetric polynomial func- 
tions, they can be easily solved by application of bipolykays. For degree 1, we 
have, for example, 


aae(1-) = (4-BY2-Y 4 BYE 4 (EYED) 


by Section 10 of [3] 
1/1 1 a 1/1 1 sah 


- (2 i (2 4 4 MSB’ by (3), (4), (5). 


c 


As far as first moments of functions of degree 2 are concerned, we can derive 
the following formulas: 


E(MSR) = MSB’ + 7 (MSR — MSB’) 
(19) 
- (1 - €) et + ost. 
E(MSC) = MSB’ + 7 (MSC — MSB’) 
(20) 
r 2 2 
= (1 - rol + T0c. 
(21) E(MSB) = oi. 


We use one of these to illustrate the derivations: 


ll 


{ 
(ESMR) ave{(? *) +c (? ty from Table 2 


Te 4 1 y by inheritance 
>» _» ie on the average 





ROBERT HOOKE 
= MSB’ + a (MSR’ — MSB’) by (3) and (4) 


= (1 - = on + Cin by (7) and (9). 


Equations (19), (20), and (21) lead at once to the following unbiased estimators 
of the o”’s: 


(22) MSB = (7 oh 
(23) 5 an | MER ~ € ia .) MSB = ( ') + +(° “)/e, 
c c ( -=- 
se Maas 1 . 2 
(24) 3 = — MSC — (- - :) MSB = ( ) + ( )/R. 
r rR :s = 


We turn next to the variances of the functions of degree 2. Variances and 
covariances of the bipolykays were tabulated in Section 10 of [3]. From these 
it is easy to find expressions, in terms of the bipolykays of degree 4, for the vari- 
ances of the mean squares of the bisample, though these expressions are quite 
long. To find the variance of M SR, for example, we recall from Table 2 that 


MSR = e -) +c (" 7 
Hence , 
2 - (/2-\ /11\ ; 11 
var MSR = var\| _)} + 2c my igh i ee vert ih. 


The three terms on the right-hand side of this equation are given in Section 10 
of [3], leading to an expression for var MSR in terms of bipolykays of degree 4. 
Variances of the estimated variance components can be found in the same way. 
These expressions for variances of quadratic expressions are long and clumsy. 
Formulas for unbiased estimators of these variances, however, are less compli- 
cated. Suppose, for example, that we want an unbiased estimator for the variance 
of éz. We have 


2 2\2 2\2 
var Gp = ave (Gz) — (ave - 


Lat Some Feel ve 


7 | Fs 30 + C(2Fi7 + 4F os + Fin) + C°(4Fi; + 2F is) 


a4 1 
= ave dr — CG) 


\ 
+ CF) +5 = Fs 33 + C(3F% + 4F'u) + 6C’F ee + C*Fs) Y, 





ESTIMATION OF VARIANCE COMPONENTS 95 


the last step following from the multiplication table in Section 9 of (3]. It follows 
that we have this unbiased estimator of var 3%, : 


‘ 1/R+1,, losses : ' . te 
a wal = ; [Foo + C(2Fiz + 4F 5 + Fox) + C’(4F is + 2F is) + C*FI 


+s [Fss + C(3Fo + 4Fun) + 6C’F 2 + C*Fs! i . 


In a numerical case, an estimator for var ¢; is not likely to be wanted unless 
Ge itself has already been computed, so there is no reason for expanding ¢% in 
bipolykays. It is of interest to note that the exponent of C (inside square 
brackets above) is one less than the number of columns appearing in the 
primary notation for each of the accompanying bipolykays; those bipolykays 
occurring in the first set of square brackets each have two rows, and those in 
the second set have one row. 


8. Analysis of variance with interaction. We return now to the full model 
(2), supposing only for the present that b = 1, so that the model is 
(25) Lig = OF 5 + OF5 + Ass + wiz. 


That part of the sampling which pertains to the 7’s, é’s, \’s, and @ can be 
described more simply (from the algebraic point of view, at least) by saying that 
pp =~ Ot attr, ¢=1,2,°+-,73j7 = 1,2,°--,¢ 


’ 


form a bisample from some R X C matrix. (See Section 4.) The model (25) 
then reduces to 


(26) Lig = pis + wij, 


where the p;; are a bisample from a matrix || p;, || for which o% , 0% , and o} are 
defined as in Section 4, and the w’s are a sample of size rc from a population of 
variance oc’. We shall use here one and two primes or asterisks to refer to func- 
tions over populations or bisamples related to p and w, respectively. 

To find E(MSR), say, for this model, we recall that 


MSR = (? ”) +c (: ‘) from Table 2. 


Taking averages and using the pairing formulas (Section 2), we have 


o «ae 9 —\** 
E(MSR) = ave aver ( + ave aver 


1 1\* 1 1\**) 
+c { ave aver ( ; + ave aver ‘ Ps 


Q ~\** 4 1 1\** 
For the population of w’s we have aver ( = = ke and aver . nd = (0, 





96 ROBERT HOOKE 


as was pointed out in Section 5. Hence 


ee 2 -\’ n 1. iY 
E(MSR) = ? 4 +k +e E q 


‘) ot+cnt+o by (3) and (8). 


Similarly, 


E(MSC) = (1 ~ r) nt+roet+oa 


and 
E(MSB) = ox +c. 

Such results have been obtained before, for example in [4], though perhaps 
not so simply. 

In model (26) the lack of replication of course leaves the w’s and }’s con- 
founded. We suppose now that there are b observations per cell, so that the 
model can be written 
(27) Lijk = Pij + Wisk 5 
As in section 5, we consider the matrix of 

Vij. = Pij + Wij. ’ 
supposing again that the population of w’s is infinite. We then obtain 
qT. c’ 2 2 2, 
(28) E(MSR) = (1 ~ 4 ox + cor + a/b, 
1/ Te r 2 : 2) 
(29) E(MSC) = (1 — 7 o.+ roe + a/b, 
(30) E(MSB) = ox + o°/b, 
where MSR means MSR for the matrix || 2;;. ||, ete. Since 
¥ r ] 1 2 
MSW =—), —— ), (tia — 2) 
ri3b—- 15 
has expectation o, we find from (28), (29), and (30) the following unbiased 
estimators for the variance components: 
é = MSW, 
oe MSB — MSW/b, 


2 — MSR/c — (2 =) MSB — MSW/bC, 
: 


1 


l 
ASC/r r R 


) MSB — MSW/bR. 





ESTIMATION OF VARIANCE COMPONENTS 97 


One would like to know the variances of these estimators, but in general the 
third dimension introduced by the subscript k seems to preclude finding them 


by means of bipolykays. However, if we are willing to assume that R and C are 
infinite, we have 


ii, = (MSR — MSB)/c = (1 : 


é¢ = (MSC — MSB)/r = ( *), 


and the variances of these can be found as follows: 


2 ( \2 
a2 , 3 11\( 

var Grp = ave aver - ite aver > 
nie ae 


( 7a \)2 
n 11 
= ave aver > a; fF; — < ave aver ( v. 
\ oe 


2 


where ).a,/; is the expression for (? ‘) in terms of bipolykays of degree 4, 


the a; being functions of r and c given in Section 8 of [3]. In this sum 8 distinct 
F’s appear, and their pairing formulas in the present situation become as follows, 
by virtue of the remarks under Case III in Section 5: 


9 ’ 
rt +f a ” ” 
ave aver I 71> F 27 - 2 ( ) Ce + 0225 


; ' LY ove 
ave aver Fi, = Fis + ( ) te, 


, 


ave aver F; = F, i = 4,8, 13, 17, 22, 29, 


where f;’, ete., are polykays of the population of w,;;. and can be expressed in 
terms of the polykays of the w; as at the end of Section 5. Finally we have, 
again from Case III in Section 5, 


‘ ‘ " ‘) 
aveaver{ j}=(|{  ), 
< ave aver (? ‘)\ = Fi, 
\ See 


by Section 9 of [3], since FR and C are infinite. Hence 


and so 


re(r — 1)(c — 1) var ¢ = 2rc(e — 1)F; + e(r — 1)(c — 1)F, 
+ 4r(c¢ — 1)Fis + 27°F ir + 4r(c — 1)Fis 
+ 4(r — 1)(c — 1)Fae + 2rF or + 2(r — 1)F 2 





ROBERT HOOKE 


+ 4r(c — 1) c ') ka’ /b 


2- F , ? 9 
+ 4r (? ) ka’ /b + 2rks'/b’. 


9. Computation. In order to make use of the formulas developed in this paper, 
it is of course necessary to be able to compute the bipolykays in particular 
numerical situations. Those of degree 2 can be easily found from equations (3) 
through (6) of Section 4, after MSR, MSC, etc., have been computed by standard 
procedures. A method of computation has been developed for the bipolykays of 
degree 4, but it will not be given here, as it is very lengthy, and it is hoped that 
better procedures can be found; this method was reported in [2], copies of which 
may be obtained on request by writing the secretary of the Statistical Research 
Group, Box 708, Princeton, N. J. 


REFERENCES 


[1] R. Hooks, ‘‘Sampling from a matrix, with applications to the theory of testing,’ Statis- 
tical Research Group, Princeton University, Memorandum Report 53, November, 
1953. 

[2] R. Hooks, ‘‘The estimation of polykays in the analysis of variance,’’ Statistical Re 
search Group, Princeton University, Memorandum Report 56, May, 1954. 

[3] R. Hooks, ‘‘Symmetric functions of a two-way array,’’ Ann. Math. Stat., this issue 

[4] J. W. Tuxey, ‘‘Interaction in a row-by-column design,” Statistical Research Group, 
Princeton University, Memorandum Report 18, July, 1949. 

[5] J. W. Tuxey, ‘‘Some sampling simplified,’”’ J. Amer. Stat. Assn., Vol. 45 (1950), pp 
501-519. 


’ 





Sain 


ON THE ESTIMATION OF REGRESSION COEFFICIENTS OF A 
VECTOR-VALUED TIME SERIES WITH A STATIONARY 
RESIDUAL! 


By Murray RosensBiatr’ 
University of Chicago 

1. Summary. Time series which are realizations of a vector-valued stochastic 
process of dimension two with a stationary disturbance are considered. Linear 
estimates of the regression coefficients of the time series are discussed, in par- 
ticular the least-squares or classical estimate and the Markov estimate. The 
least-squares estimate is the estimate computed under the assumption that 
the components of the disturbance are orthogonal processes and orthogonal 
to each other. It is known that the Markov estimate is in general better than 
the least-squares estimate. The asymptotic behavior of the covariance matrices 
of the least-squares estimate and of the Markov estimate is described. Con- 
ditions under which the least-squares estimate is as good asymptotically as 
the Markov estimate are obtained, that is, conditions under which the least- 
squares estimate is efficient asymptotically in the class of linear unbiased 
estimates. The analogues of the results described for vector-valued time 
series of dimension greater than two can be seen to hold. 


2. Introduction. The presentation of the results of this paper is carried out 
for the case of a two-dimensional process because of the greater simplicity 
and clarity in exposition. The general n-dimensional case is briefly discussed 
in Section 9. Let us considera two-dimensional complex-valued discrete parameter 
process, that is, a sequence of stochastic vectors 


1 10 ym 
a ( ) = 2%,+ mM, = ( ) -_ ( ‘ 
(2.1) 2 20 4 27 


t= ---,—1,0,1 


g @ et. 


where m, = Ey, is the mean value sequence and x, = yt — m, ts the residual process. 
, ° , . 
We introduce the covariance sequence (x, denotes the conjugated transpose of x;) 


’ , ; ’ E 2, ity Ex, 2, 117s ,¢ 127's,t 
E(y. — m)(ye — m)’ = Exe, =| 5, : ._)|/= 
(2.2) EB ote:E, E ote 2k, 217s ,¢ 227°s,¢ 


= Ts 2. 


The assumption that the random variables are complex-valued is made for 
mathematical convenience. The real-valued case is, of course, the one of greatest 


Received October 4, 1954 

1 Research carried out at the Statistical Research Center, University of Chicago, under 
sponsorship of the Statistics Branch, Office of Naval Research. 

2? Now at Indiana University. 


99 


ca 








100 MURRAY ROSENBLATT 


statistical interest and is discussed in Section 7 in some detail. Sections 2 and 3 
are an extended discussion of the assumptions made in the paper and their 
motivation. All the assumptions made in Sections 2 and 3 (except possibly for 
that of a real-valued time series) will be held to in all sections except Section 8. 
The residual process x; is said to be stationary in the wide sense if r..2 = Tr-2, 
and I shall assume that this is the case. Then the covariance sequence has the 
representation 


(23) — / e™ aF(a), 


where F(X) is a matrix-valued function 


' : = Fy) Fy2(d) 
(2.4) FO) = are os) 


that is nondecreasing; that is, AF(A) = 0 (ef. [2]). The functions F(A), Fe(d) 
are the spectral distribution functions of 12: , 2: , respectively, while Fy2.(X), Fa(d) 
are the cross-spectral distribution functions of the two coordinates of x,. We as- 
sume that the spectrum is absolutely continuous; that is, that 


r 
F(A) = / fis(u) du, i,j =1,2 


9 &» 
i— 


and that the spectral densities f;;(\) are continuous. The spectral densities f;;(X), 


t = 1,2, are assumed to be positive. Note that fi(\) = fu(d). The inequality 
(2.6) fix(A)|? S fi(A)fo2(A) 
obviously holds. We shall assume that 


(2.7) \fio(A)|? < fir(A)fo2(A) 
for all x. 

We shall refer to the set of spectra satisfying this set of conditions as the admis- 
sible set of spectra. The equality |fi2(A)|’ = fu(A)feo(A) for all \ amounts to a 
linear relationship between the two coordinates 2,7, of the form 
1% = > 5 Cj e-3- If the processes , are orthogonal processes, the spectral 
densities f;;(\) = o;/2r, i = 1, 2. Such processes are sometimes referred to as 
‘white noise.”’ If the processes ,z; , ot, are orthogonal to each other, the cross- 
spectral density f(A) = 0. 

In Section 7 we shall assume that the process x, is a real process. This condition 
imposes additional restraints on the spectrum, specifically that 


(2.8) fi) = fi(—d), 
and fix(\) = fou(—r). If the process is real, the admissible class of spectra must 


satisfy these additional restraints. 
Let the regression yn;, 17 = 1, 2, be of the form 


Pi 


( 
(2.9) im, = Do tres”. 


ven! 





aia a 


ESTIMATION OF REGRESSION COEFFICIENTS 101 


The problem posed is that of estimating the regression coefficients ,c, from a 


time series 7; , --* , yw. The regression vectors 
(v) 
#1 
(y) ° 
eo” =| : 
(¥) 
ifn 


are assumed known. We are interested in unbiased estimates that are linear 
in the observations w,, 7 = 1, 2;¢ = 1, --- , N. The two linear estimates 
that we are specifically interested in are the least-squares estimate and the Mar- 
kov estimate. Let 


im ii 
m = : , i ° ’ = 1, 2, 
iMy Yn 


and 


(=) (”) 
m= . y= ; 
gm oy 


Define the vectors ,c and ¢ by 


iC 
c= : ’ t — l, 2; 
Cp; 
iC 
c= 
2C 
Also define the matrices 
(1) (p,) 
# = (# 22 7 
iP 0 
> = 
0 & 


The fact that m;, is the mean value of y; can be written in vector form as 

(2.10) m= Ey = ®c. 

The least-squares estimate ct. is the estimate that minimizes the quadratic form 
(y — m)'(y — m) = (y — &c)'(y — &c); 


that is, ct (®’@) '@’y. Note that we are assuming that &'@ is nonsingular. The 
estimate ce is unbiased 


(2.11) Ect = (#'@)'@’'Ey = (®'6)'®'bc = c 
and the covariance matrix of ct is 


E(ct — c)(c, — c)’ = (@'6)'*’RO@'S)"*. 





102 MURRAY ROSENBLATT 


The matrix R is the covariance matrix of the vector y. Our assumptions con- 
cerning the spectrum of the process z; imply that the matrix RF is nonsingular. 
The Markov estimate 

. —1 _ 

cu = @’R'*) Ry. 
It is also unbiased and its covariance matrix 
‘ ‘ * * lz \-1 
(2.12) E(cy — c)(cy — c)’ = @’R ®)~. 


The Markov estimate is minimum variance among all linear unbiased estimates 
in the following sense. Consider any unbiased linear estimate c*¥ = My, Ec* = 
M@c = c; that is, M@ = J. Its covariance matrix 


E(c* — c)(c* — c)’ = MRM’. 
One can then show that 


MRM’ = (#’R™'8)". 


These remarks about the least-squares and Markov estimates are well known. 

We shall investigate the asymptotic behavior of the covariance matrices of 
the least-squares and the Markov estimate as N — «. Note that the least- 
squares estimate is identical with the Markov estimate when the processes 
x, are orthogonal processes and are orthogonal to each other. It is of consider- 
able interest to find out when the least-squares estimate is asymptotically as 
good as the Markov estimate, that is, when it is asymptotically efficient in the 
set of linear unbiased estimates. Whenever we use the phrase asymptotic effi- 
ciency we mean asymptotic efficiency in the class of linear unbiased estimates. 
The least-squares estimate is much easier to compute than the Markov esti- 
mate, since it does not require knowledge of the structure of the process z, . 
Even if the structure of the process z, is known, the computation of the inverse 
R™ may be very tedious. We will discuss the question of asymptotic efficiency. 
These problems are discussed in [3], [4], and [5] for one-dimensional time series. 
New aspects of these problems arise in the multidimensional case that we dis- 
cuss in this paper. The principal results of the paper are given in Sections 4, 5, 
6, and 7. 

The discussion is based on what might be called a generalized harmonic 
analysis of the regression vectors. In carrying out this analysis we will have to 
impose some conditions on the asymptotic behavior of these vectors. However, 
these conditions will be sufficiently broad to allow most of the usual types of 
regression sequences. The techniques used are similar to those employed in [5]. 


. (y N (vy) 12 . ‘ r 
3. The regression spectrum. Let @y’? = Dotilig,? |, i = 1, 2. We 


( r ‘ *,° . . ° ° 
first assume that vy’ — ~ as N — «. Some condition of this type is required if 
we are to be able to estimate c consistently. We also require that 


(3.1) lim @y), / @Y = 1 


N+ 








ESTIMATION OF REGRESSION COEFFICIENTS 103 


for every fixed h. Let the limits 

N s (7) \s4) 
(3.2) gM?” = lim >> = = y 
n> tal V a oy 


i,j =1,2;¥=1,--+, psu =1,--- , pj, exist for allh = 0. If weset go,” = 
0 for ¢ < 0, it can be seen that the limits ,,;M%;”, h > 0, exist and that 


(3.3 ) gM” -_ pM”. 


Let the matrices 


(3.4) ijM), - {sMi 3» = s,. 408 pak = 1, °** » Dijs j= 1, 2, 
and 
uM, 2M, 
M, = : 
21M, 2M) 
The matrices M,, h = ---, —1, 0,1, --- , form a positive definite sequence; 


that is, given any p; + pe, vector z and any finite vector a 
+ i a,2'M,_,za, = 0. 
sp 
The matrix sequence M),, then has the representation 
(3.5) Ma | “ame, ia ---,~i,@1,---, 
where M(x) is a matriz-valued function that is nondecreasing so that AM(A) 2 0 


for all \. Note that if all the regression vectors are real, we have dM(X) = dM(—). 
It will be convenient at times to write M(d) in the form 


uM(A) 2M (A) 
(3.6) M(a) = ’ 
auM(A) 22M (A) 


where ;;M(A) is a p; X p; matrix. It is clear that 
(3.7) iM, as e dM(a), t,J _ Ly 2. 


The matrix-valued functions .M(A),2M(A) are nondecreasing and »M(r) = 
21M (X)’. Let 


(3.8) My, = M(x) — M(-—-7r) = M 
and 
(3.9) iMy = wM(r) — «M(-—r) = M, t=1.2 


ee 





104 MURRAY ROSENBLATT 


We shall assume that ,M and 2M are nonsingular. This means that there is no 
vector 


such that 


Ps 


( 
Zz SS 0 
N 


iP 
@; : 


y=l1 


asN—«, i = 1,2. Thus, vectors ¢”’ are asymptotically linearly independent 
in the sense described above. Such a condition is required if we are to be able 
to estimate the regression coefficients consistently as N — «. The conditions 
imposed on the regression vectors are sufficiently broad to include the case of 
polynomial or trigonometric regression or mixed polynomial and trigonometric 
regression. 

The singular case in which the regression of one component 2m, = 0 and one 
wishes to estimate the regression coefficients ;c, of the regression of the other 
component does not satisfy the conditions imposed on the regression vectors 
above. It is, however, of some interest and we shall discuss it in Section 8. 

Let the diagonal matrix 


V @¥ 


0 


V eo”? 
We shall show that the limits of 
DyE(ct — c)(ct — c)'Dy = Dy(6'b)" DyDy'®’ ROD Dy (4S) “Dy 
and 
DyE(cy — c)(cu — c)’/Dy = Dy(@'R™"&) "Dy 


exist as N — and shall obtain expressions for these limits in terms of the 





il 


ESTIMATION OF REGRESSION COEFFICIENTS 105 


spectrum of z, and the regression spectrum (see Sections 4 and 5). It will be 
convenient for us to write 


Ry R,2\ 
(3.10) R= 
Ra Ry 


where Ry , Re are the covariance matrices of yy, sy respectively, while Re = Rx 
is the cross-covariance matrix of yy with sy. 
Note that the assumptions concerning the regression vectors imply that 


| ie M Oo 
(3. '(o' &) Dy’ = (? 
11) lim Dy (® ®)Dx ( 0 ath 


which is nonsingular. 


4. The least-squares estimate. 

THEOREM 1. Under the assumptions made in Section 2 on the spectrum f(r) 
of the process x, and the assumptions made in Section 3 on the regression of the 
Process Yt, 

: 1% / 
lim Dy Ec, -— CAC. = c)’ Dy 
N+@ 


5 


| fu(—») dM (x) | fio(—A) dM (A) 


ioe”: GS 
(4.1) — « yo 7 7 


- | ful- ) dM (A) | foal —)) da2M (A) 


Mm" 0 
x ( 0 M ). 


In discussing the asymptotic behavior of the covariance matrix of the least- 
squares estimate, it will clearly be enough to consider Dy'é’R&Dy’. We will 
approximate R above and below by positive definite matrices of a simpler form. 

Consider the quadratic form 


5 ; , 
z’Rz 12 Ry 2 — 22 Roi21 + 12’ Rize + ZohRooz, 


12 . 4 
where z = ( ‘) and ,z, 2 are N-vectors, so as to see how to approximate R 


conveniently. Clearly 


2 Rz = [ jve(—2)[* fuld) dd + | oz(—d) for(A) 12(—A) dr 


“—arF 


+ 12(—A) fro(A) 22(—A) dA + | loz(—A)|? fox(d) dd, 


N ik ° 4 , ‘ ; 
where (A) = 23 ie, t = 1,2. Note that 2’Rz can be written in the 
more convenient form 

5 


Re = | z(—d)’ f(r) 2(—A) dd, 





MURRAY ROSENBLATT 


2(A) fuQ& fd) 
z(A) = ’ f™ _ : . . 
92(X) far(A) Jo2(A) 
Now |f(A)| is nonsingular for all A, since fi(A)foo(A) > \fio(A)|” for all » 
Let 


be two positive definite 2 X 2 matrices. They are positive definite if and onlv 
if a;,b; = Oanda;b; 2 |e;|", «= 1, 2. Moreover, 


aH 
C1 by 
if and only if a; = a2, bs = be, and (a; — ae)(bi — be) — |e — c| = 0. Now 
fuld), foo(A) are positive continuous functions and inequality (2.7) holds. Given 
any « > OQ, we can find finite trigonometric polynomials 
q 


(4.2) gij(A) = 2 iiJk Fs 


i 


q 
(4.3) Auth) « > alse”, 
k=—q 
i,j = 1, 2, satisfying the inequalities 
gu(A) = fud) => hay (A) > 0, 


Joo(A) = fo2(A) 2 


he(dr) > 0, 


gu(A)gee(A) > \gi2(A)]*, 
> hyo(X) e 


hy (A)hoo(d) 
€> (guld) — fiuld))(go2(A) — foo(A)) > |gie(A) — fis(A)]’, 
€ > (fuld) — An(d))(foo(A) — heo(A)) > |fie(A) — hold)’, 
gii(A) — hildA)| <6, 
4 = 1,2, for all A. Let 
Gis = {iter 5k, l = 1, - 
His = { cshe_sr; &, 


Gy 
G = 
Gu 


and 





ESTIMATION OF REGRESSION COEFFICIENTS 


gu(d)  gie(A) hu(A)  hi(A) 
g(A) = ' h(a) = ) 
Jar\ A) g22(A) hay (A) heo(d) 


z Gz= [ z(—d)’ g(A) z(—A) dA = 


° 


[ 2(—r)' f(a) 2(—2) dd > 


- [ 2(—d)' h(d) 2(—A) aa, 


—_—_ 


so that 
(4.4) G2R2H. 

Clearly, Dy'®’G@Dx' = Dy'®’R®Dy' = Dy'b'H®Dy;'. We shall obtain the 
limit of Dy'’G@Dy' as N > «. This matrix is easier to deal with, since ;;9,-. = 0 
if |k — l| > gq. A typical element of the matrix in question is 


N @) (u) q N—k ~ (%) _(y) —1 N ~(v)  (u) 
> He Gir iPr _ > g > WPtik Pr 7 g = Wr+k iPr 

= k a k a : 
t,r=1 ‘eo @ oso” ce Jf () 4) k=—q sae J/ (*) tu)’ 
. Vv Py jPn Py jPn Py jPn 


which approaches 


~ ijJk My” = 2r [ gij(—A) d iM” *() 


k 


as N — o, so that 


| gu(—A) d 1M (A) | gi2( — A) d 12M (d) 
lim Dy'’G@D;' = 2) : 


. r er 
N+o 


/ gu(—d) d aM (a) | go2(—A) d 2M (A) 


In like manner, one can show that 


| hy(—A) d uM (A) | his(—2) d 2m 
lim Dy'¢’H@D;' = 24) 


| hey(—d) d 21M ( ) / hee —X) d M0) | 


Making use of the inequality (4.4), on letting « — 0 we see that 


3 


| ful—A) d y»M() | fi(—A) d 2M 00 | 


| for(—X) d aM (A) | foo(—d) d or) | 


“a . 1? 1 ‘ 
(4.5) lim Dy ® R@DY = 2x 


N+x 


v—F 





108 MURRAY ROSENBLATT 


This limiting matrix will be shown to be nonsingular in Section 5. Thus (4.1) 
is valid. 


5. The Markov estimate. 

THEOREM 2. Under the assumptions made in Section 2 on the spectrum f(A) 
of the process x, and the assumptions made in Section 3 on the regression of y; , 
(5.1) lim Dy E(c*, _— c)(cy — c)’ Dy 
N+ 

r f f rT \ 

[ foo —d) dM (A) [ fie(—d) d »M (A) 

—¥ Sir —X ) foo( —}) —_ fisl —h)|? ® fir( —)) fool —d) —_ fis —)\? 


[ fau(—r) daM(a) i fu(—dA) d »2M (A) 
a+ fir —X) foo —) —_/ fis —d)|? —fF Sir —X ) foo —) — | fil —A) 2 


In discussing the Markov estimate it will be enough to consider Dy'@’R™@D;'. 
We shall again approximate R above and below by positive definite matrices 
of a simpler form. Here we will approximate fi:(A), foe(A) by the absolute square 
of reciprocals of finite trigonometric polynomials, while f(A) will be approxi- 
mated as before by a finite trigonometric polynomial. Given any « > 0, we 
can find finite trigonometric polynomials 


2r 


a(A) = 


B;(X) 


1, 2, satisfying the inequalities 
a,(A)|* = fu(A) = 
\8:(A)|> S fao(A) = |B2(d)|™, 
lar(A)Bi(A)|? > fad), 
\ovg(A)B2(A)\~? > fye(a)]?, 

e > (lon(A)[? — fad)? — fa) > nd) — far)? 
€> (Fuld) — |as(A)[*)(fea(A) — |62(A)) > fred) — fad)? 
lay(d)[* — Jaa(A)\™ < «, 

\B1(A)|~"> — |B2(A)|* < 


i 
R= 
io iN» 


(5.3) 


for all \. Let 


be the covariance matrix of y when the process 2; is such that |a;(A)|’, |8;(A)/~ 
are the spectral densities of the components x; , »t, , respectively while 7,(A) is 























ESTIMATION OF REGRESSION COEFFICIENTS 109 
the cross-spectral density of ;z, and »z,, i = 1, 2. It is clear that ,R and .R 
are nonsingular and that 

(5.4) R=>RE R. 


For the moment let us assume that R is of the same form as one of the mat- 
rices ,R. Then 


=] ~] =] 
Ry A A’ 9 Ree = W w’ 


al 
where Aj; a; and wi; = 8; unless 1,7 S q or 

N—-—i,N —j Sq (ax = 0, & = 0, ve = O if |k| > gq). 
It is also clear that the (7, 7)th element of Ry is y;-;. Let P = AR,» w’. Then 


Ata A Pw’ ) 


R = 


1 1 --] 1 
w PA” ww’ 


re 0 (’ P re 0 
(lo w/\P’ ) 0 wt) 
( r FF ( I —P 
and 
Pp’ I — P’ I 


are nonnegative definite. Moreover, they commute. Since the matrices commute, 
it is clear that their product is nonnegative definite and that J — PP’, I — 
PP & ©. 

We would like to show that the maximal eigenvalue X,, of either PP’ or P’P 


is less than one and bounded away from one as N — ~; that is, Am < 1 — €as 
N — for some ¢ > 0. This can be seen by noting that the minimal eigen- 


values of both 
( I ") I —P 
and 
Pr 7 — Pp’ I 


are bounded away from zero as N — ~. It will be enough to show this for 


( I °) 
rf 
Let u be any 2N vector. Then 


3 ( I ") . sa Ris 
Uu u=v0U 
Y 2 Ry Re 


@ 0 
eu 
0 Rz 


It is clear that both 


= w'v 


a 


u= ec’u'u 





110 MURRAY ROSENBLATT 


A 0 
ts N >, where gd > Oand v= w'( ) Nom 
QO w 


i 0 5 "y( 0 
R- = , 
0 w’/\P’ J 0 .) 

L P\* (I — PP’)* —P(I — P’P)" 

P’ I —-P(I— PP)* (— PP) 
I —P\/(UI — PP)" 0 
a <4 0 i - ane 

We can write 


(I — PP’)* 0 « /(PP’)* 0 ) 
(5.5) ( _)= = ( : 
0 (I — P’P) k=O 0 (P’P)* 


(5.6) since 0 S PP’, P’P s (1 — ©. But then 


Dy'®’R™“D;' 


:( iDy' :8'A'(PP’)'A 1 Da a 


k=O 


ei > Q. 


k=0 


Now the (», u)th element of ,Dy',®’A’P(P’P)*w@.Dy' is 


—2Dyx' 'w'P'(PP’)*A Dy" 2Dy' 'w'(P’P)*w @ Dy 


(¥) 


(m) 
1t t.72Pr 
be ? 
= ( ( 
— Vv ey? y 


Ze J— 


kis | eNO (d) |ae(r) |**? \a(a)|"? l-y(a) |” dd 


= Spor 


unless ¢, 7 S 12(k + 1)gorN —t, N — + S 12(k + 1)g. Note that s; = 0 if 
\j| > 12(k + 1)g. It is clear that each of the terms 


— 
(¥) (u) 
Pt Mt. Wr 


Va 





















ESTIMATION OF REGRESSION COEFFICIENTS 


approaches zero as N — , since |h;,,| is uniformly bounded in ¢, r, N and 


(») 2 (¥) 


Yt | /iPw —0 
as N — o for fixed ¢. But then 


N ¥) (ps) 
. vr his Wr 
lim 7 
/ (¥) ( 
eo gy 





) 


N (FF) \s 
. 1 k 2 
= > » int, See 


o(k-4 atte aa ae”) x (wm) 
k| S12(k+1)¢ N+o r=] iPy oP 


= LeuMi” = an | ¥(—A) Ja(—ar)|*** |a(—a) |" |y(—a) |” 
x dyM”” (a). 

But then 

lim ,Dy' ,®’AP(P’P)‘w Dy 


: . 
1 [ (—A) |a(—d)|** |B(—v)|** |y(—r)|* duMQ). 


Similarly, one can show that 
lim ,Dy' :®’A’(PP’)*A :®1Dx" 
N-+ox 


™ = | |a(—r)|*** |8(—r)|* |y(—d)|* duM(a) 
at J—az 
and 
lim 2D" 2'w'(P’P)*w #0 Dy 
= 5 | la(—»)\ [a(—»)/** |x(-2)* dat Q). 
tT Jme 


- 


Making use of (5.6), it then follows that 


(5.7) lim Dy'#’R“®D;' = lim > Q = > lim Q, 
N+ N+ k=O k= Now 
Ss Sas |(—d)|” 
1 | Je 1 — ja(—Ad)|? |8(—d)/? |y(—A) 
on fo CW att lar 
- 1 — Ja(—d)|? |8(—d)/? ly(—a 


[" __ (=) Ja(—r)/? |a(—»)? 
J_z 1 — |a(—d)|? |8(—A)}* |y(—))?? 





2 d uM (a) 
2 d 21M (A) 


di2M (a) 





7 \8(—»)|" f 
[. 1 — la(—d)|? |@(—d)|? (aay tO) 





112 MURRAY ROSENBLATT 


We have thus found limy... Dy'®’R™&D;' when R is of the same form as one 
of the matrices ;R. Let R now be of the general form, and approximate above 
and below by iF and 2, respectively. On letting « — 0, we see that 


lim Dy'@’R@D;z' 
N+ 


iF fea(—2) 
Sane -~- A 
Jon fu(—A)fea(—A) — Fa(—ayeo™ 1(r) 


s me) 
Fo gga et Ce 
| Fal—\fa(—») — at oe 
a [ : fix(—A) 
Jee fu(—d)fe2(—)) —_ | fio( —A) : 


ae 


d 2M (ad) 


fiu(—d) 
: 2M () 
Laois —_ fix(—A) » dn £() 
We can show that the matrix (5.8) is nonsingular. Clearly 
(* uM (A) AM,.(\) 
> 
AuM() AwM (A) 
and fir(A) fax(A) > |f(d)|’ for all A. Now 
fo(—d)zAuM()a om f(—d)z14»M (A)z2 
—fu(—dr)z2A aM (A) + fu(—d)z2A 2M (A)ze 


| (—X) j2 , ' (—d . 
a me ) zAnM(aA)a ; (fu(—») fis )| 
Ju(—A) on 


5 AoM ( 
feo(—X) ) taaM Nery 


= (‘2(-» — 


so that 





1 ( foa(—d)A uM (A) pectin 


fu(—)fe(—d) — |fa(—Y)?\—fa(—rv)dnM(rd) — fu(—AAnM(d) 


fo 0 
= ; 
0 AwM (nr) 


where e > 0. But then matrix (5.8) is greater than or equal to 


1 (M, 0 
) € 
2x \o M, 
and hence is nonsingular. Relation (5.1) is valid. It follows that the limiting 


matrix (4.5) is also nonsingular. 


6. Asymptotic efficiency of least-squares estimate among linear unbiased 
estimates when observed process is complex-valued. We want to find out 





ESTIMATION OF REGRESSION COEFFICIENTS 113 


for what types of regression the least-squares estimate is asymptotically 
efficient among the linear unbiased estimates for any admissible spectral 
density matrix f(A). This amounts to asking for the conditions on M(a) 
such that matrix (4.1) is equal to matrix (5.1); that is, 


Jan® J/—fr 


0 
| fu(—d) dM (a) fool —) tam) | ¥ 2M 


| | ful—)) dyM(v) | fio(—d) dM) | uo 
(6.1) ; 


ar 


foo( — ; 
rift conte —— d,M(d) 
Jew fir(—A) fool —X) — tie —)/? 
at . F cil 
-| ain ce daM(r) 
J—r Jul —A)Joa(—A) — Sie(—A)|? 
i fi2(—d) 
ams - - = — —_— 1 oM (v 
l gee — |fi(—A) aT (iM 0 
6 fu(—d) - \ 0 ss) 
_ — - dwaM(r 
b omaees is | fiz(—A)|? ol 


Let us first see what restraints are imposed on the regression spectrum if we 
require asymptotic efficiency of the least-squares estimate in the smaller class 
of spectra f(A) where there is no cross-correlation, that is, where f(A) = 0. 
Since ,y; and ay; are uncorrelated, they can be treated separately. We make use 
of the results of [5] where the problem of estimating the regression coefficients 
of a 1-dimensional process with stationary residuals is discussed. The following 
restraints on the regression spectrum follow immediately from these results. 
The nondecreasing function ,//(A) increases only on a finite set of points 
Aj, J = 1,-'+,qQ, Whereg; S p;, «t = 1,2. The jump of ,M(A) at A; is 
AiwM(As) = «M(As+) — «M(As—) > O and 


(6.2) AM (A;) MA «M(as) = 644M (Aj), i=1,2 


The sum of the ranks of the matrices A,;,M(A;), j = 1, --- , qi, is pi. Itis 
then clear that the set of points of increase of the nondecreasing function M (A) 
is the set of points {A,;} consisting of the points ;\,, k = 1, --- , q@, and od, 
k = 1, «++ , qe. For convenience let ;;M, = 4;M(Qr+) — i;MQu—). Relations 
(6.2) can then be rewritten ;,,.M; ;M";;Mi = 6 4M;, i = 1, 2. Here either 
uM; > 0 or »M; > 0. We shall obtain additional restraints on the regression 
spectrum and thereby show that the sets of points {A,;}, {:A;} and {2\,;} are the 
same. 

Let us now see what additional restraints on the regression spectrum are 
implied by asymptotic efficiency of the least-squares estimate when the spec- 





114 MURRAY ROSENBLATT 


trum is such that fu(A) = f(A) = 1 and f(—A;) = @;, |a;\” < 1. The condition 
for asymptotic efficiency is then 


uM; aj;2M;\ (1M =O 1 uM, —ori2M: 
~ c aM; an 0 pa 1 — |a,|? .* aM. =M, 
iM 0 
; ( 0 au) 
If all the a’s are zero except for one, say a; , equation (6.3) reduces to 


( uM; 4 ( 2M 2M aM; oT 
—ajnM; 2M; —&jnMjM'uM; 21M;,M ‘2M; 


(6.3) 


= 0 


’ 


so that «M;.:M ‘4M; = «aM;, i, k,l = 1,2, i ¥ k. If all the a’s are zero 


except for two, say a; and a, , equation (6.3) reduces to 


] ( 0 aj; 2M; 0 oe — a 2M, 
1 — joxl’ \a;nM; 0 0 M*)\—aeuM:i |oul’ 2M: 


\ / 


+ 1 ( 0 a 2M, 4 0 oe —aj2M; = 0 
l1— jar;|? a 21M, 0 0 2M ™ —aj»M ; \cx;|"22M ; 


But equation (6.4) cannot hold for all values of a; , a, less than one in absolute 
value unless 


(6.4) 


wM; M",M, = 0, j¥k; i,l,8=1,2; il. 


All other relations of this type can be obtained analogously from the matrix 
equation resulting from the interchange of the first and third matrix on the 
left side of equation (6.3). This equation obviously also holds if the least-squares 
estimate is asymptotically efficient. All the restraints on the matrices ;;M, can 
be written briefly 


(6.5) iM, jM™ : iM, = Ske iM, ’ 


where 7, 7, / = 1, 2 and k, s = 1, --- , gq where g = q: = q@. It is clear that 
equations (6.5) cannot hold unless both 1M; , 2, > 0 and have the same rank. 
Thus asymptotic efficiency of the least-squares estimate implies that pi = pe. 

THEOREM 3. The following conditions are necessary if the least-squares estimate 
is to be asymptotically efficient for all admissible f(X). The function M(x) is a jump 
unction with a finite number of jumps \1, +--+ ,A_,, whereq S p = fi = Pr. Let 
the jumps be 


Me = uM +) — Mux —-), 


Then (A > O if A is a nonnegative definite matrix but not the null matrix) 
iM, > 0,1 = 1,2 andk = 1,---,qand 


(6.6) iM, ;M '.M, = by aM, ; 





ESTIMATION OF REGRESSION COEFFICIENTS 115 
where i,j, 1 = 1,2 and k, s = 1, +--+, q. uM and »M;, have the same rank. The 
sum of the ranks of My, k = 1,-+--, 49, is p, i = 1, 2. These conditions can 


easily be seen to be sufficient for the least-squares estimate to be asymptotically effi- 
cient for all admissible f(d). 

It is of especial interest to consider the case in which both components yy: , 2: 
of the observed process have a mixed trigonometric and polynomial regression 
and the regression vectors of both components are the same; that is, 


— (vy) _ gv —tthy oe 
w =m =e yv=0,1,---,&, 
(a, +1+97) (a, +1+7) —ttr 
191 * = «| = (’e""*. y=0,1,-°-°-, &, 
ait-+s+ey_1tu—-l+y) _ (ay +++ ++8y—1+u—1+9) 
1% = a 
_ 4%, tthe = 
=te ’ y=0,1,°--,&, 
where A, , --* , Ag are distinct. The least-squares estimate can be seen to be asymp- 


totically efficient in the case of such a regression. The jumps of M(A) are 
ati, °°: ,A, and 


0 0 OO] 
uM. =|0 M, O|, 
0 0 0. 


where 


‘ 


VJ (Qu + 1) (2p + 1) 
M, = — -uov = 0,1, ---, 8 
' uptvtil er , , 


and the null submatrix in the upper left-hand corner of ;;M, is of order 
dizi (8; + 1). It is clear that equations (5.6) are satisfied in this case. One 
should note that if the regression vectors of the two components are unequal in num- 
ber, the least-squares estimate is not asymptotically efficient. This, for example, 
would be the case if one component had a linear regression and the other a quad- 
ratic regression. 


7. Asymptotic efficiency of the least squares estimate when the observed 
process is real-valued. The case of greatest interest is that in which the 
process y, and the regression vectors are real. This condition imposes addi- 
tional restraints on the spectrum of the process and the regression spectrum. 
Then 


we Ful) = fi(—d), t= 1,2, 
f(r) = fu(—A), 

and 

(7.2) dM(x) = dM(—)). 


We shall obtain necessary and sufficient conditions on the regression spectrum 
for the least-squares estimate to be asymptotically efficient for such a process. 
The derivation of these conditions is analogous to that followed in Section 6. 





116 MURRAY ROSENBLATT 


Just as in Section 6 one can see that asymptotic efficiency of the least-squares 
estimate implies that there are only a finite number of points of increase of M (A). 
Because of (7.2) we need only consider the nonnegative points of increase of 
M(x). Let the nonnegative points of increase be ), 0<rA.<--- <\,. Of 
course zero needn’t be one of the points of increase but we include it because if 
it is, the condition on the jump at zero is different from that on jumps at other 
points. Let ;;M; denote the jump of ;;/(A) at \, . Given the matrix A, let Re(A) 
and Im(A) be the matrices whose elements are the real and imaginary parts, 
respectively, of the corresponding elements of A. Equation (6.1) can then be 
rewritten as 


uf; Re (uM;) Re (2fj2M;)\ /1M~ =O 
>’ 2/ .)x 
Re (a fj aM ;) of; Re 2M ; 0 2M 
yo 2 ( oo fk Re (4:M;,) —Re oA oe 0 ) 


k fi 22 fi 7 lio fel? —Re (ai fe 2M ;,) uti Re (soM;) 0 2M 


making use of (7.1) and (7.2). Here, iif = isf(Ax). The primed summations indi- 
cate that the coefficient 2 in the summation is to be replaced by coefficient one 
when either 7 or k = 1, since \; = 0. Because of (7.2) we can see that the mat- 
rices ;;M, have real elements. The fact that fi2(A) fx(d) indicates that ii , 
i,j = 1, 2, is real. Now equation (7.3) is assumed to hold for all nf; , »f; > 0 
and all yf; such that |»f;|" < ufjef;. A discussion of equation (7.3) analogous to 
that carried out in Section 6 indicates that the equation cannot be valid under 
these conditions unless the following restraints on the matrices ;;M;, are satisfied: 

2 Re (;;M1);M~ Re (#M,) = 6, Re (aM) 
if | ~ 0; 

iM, iM 2M; — nM ’ 

(7.4) a 

2Im(,;;M1);M Im(;:M;) = —dby Re(;:M.), 


if i ~ jandl, k $1, since Im(,j;M,) = 0; 


Re(,;M:z),M~ Im(j.Mx) = Im(;.M,):M™ Re(,M;,), 


where j ~ land k ¥ 1; and finally, 


Re(;;Mx);M~ Im(j.M,) = Im(j:Mi).M~™ Re(..M,) = 0 


if k # s. It can also be readily seen that equation (7.3) will be satisfied for all ad- 
missible spectra of the process if the conditions (7.4) just derived are satisfied. 
TuHeEoreM 4. The least-squares estimate is asymptotically efficient for all admissi- 
ble spectra of the process if and only if the regression spectrum is a jump spectrum 
with a finite number of jumps and the matrices ;;M;, satisfy the conditions (7.4). 









ESTIMATION OF 


REGRESSION COEFFICIENTS 


117 


It is again of special interest to consider the case in which both components 
Yt, 2% Of the observed process have a mixed trigonometric and polynomial 


regression and the regression vectors of both component are the 


a. Pn) ee 
vy =m =8, 
(ey tl+p) __ sitit+p) _ 4p 7 
Pt = © tL cos fre 
(3, +82+2+»p) (#1 +82+2+p) y .° 

a Pie =t sinte, 
$y +2eg+-++++28,_1+2u-1+y) __ (81 +282+- 

Pt -— 
8, +209+ +++ +28,~1+8,+2u+p) (e,+282+-- 

1Pt Pr 


where A; = 0 < Az < 


Now 
M, 
iM, = 
0 
where 
M, <(V@u + DQ + 1) 
ptv+i 
0 O 0 0 
0 M, -iM, 0 
iM, = 4 
0 iM, M, 0 
0 O 0 0 
where 
u, = / (Qu + 1)Qr + 1) 
M, «(2 aout 


< X,. The jumps of M(A) are at 0, 
We need only discuss the jumps at nonnegative A, since dM()) 


same, so that 


y=0,1,---,8, 
- = 0, l, » 82, 
y= G1, +-+ a. 
-+28,12u-—l1+7) __ t” cos Dr 
= ‘ - 
Y= a, i. et 
*+28,1+8,+2u+y) __ t’ sin tr 
= ‘ es 
y= 0, 2. te Bi 


HAs, °°* , Ay. 


= dM(-}). 
0 
0 
;u,» = 0,1, oo, BP, 
’ 
| 
;u,v = 0,1,---, 8:7, ‘#1, 
4 


and the null submatrix in the upper left-hand corner of MM, is of order 
8: + 1 + 2 Dix) (ss + 1). It is clear that the least-squares estimate is asymptoti- 
cally efficient in the case of such a regression, since the conditions (7.4) are satisfied. 
As in the case of a complex-vaiued process, one does not have asymptotic 
efficiency of the least-squares estimate if the regression vectors of the two 
components are unequal in number. There is, however, an additional restriction 


that enters into this context and did not arise in Section 6. If one of the terms in 
























118 MURRAY ROSENBLATT 


the regression is t’ cos th, \X # 0, one must also have the term t” sin td for asymp- 
totic efficiency of the least-squares estimate. Thus, one does not have asymptotic 
efficiency of the least-squares estimate in the case of the regression 


ym; = 1¢ cos AX, 2m, = 2¢ cos fh, A ~ 0, 
and one does in the case of the regression 
im, = 1€; cos fA + 32 sin AA, 
2M, = 2C; COS LX + of2 sin fA, A ~ 0. 


8. The Markov and least-squares estimate when the regression of one 
component vanishes. The special case in which the regression of one component 
vanishes, say »m,; = 0, has not yet been discussed but is of some interest. It is 
clear that the least-squares estimate can not be asymptotically efficient for all 
admissible spectra of the observed process in this case. Let c* now denote a 
linear unbiased estimate of :c. The least-squares estimate of ,c in this case is 


(8.1) cr = (®'b) ‘Py. 
The covariance matrix of the least-squares estimate is 
(8.2) E(ct — xe)(ct — 1c)! = (6's) 18’ Ru 00's). 


° * ° 
The Markov estimate cy of :¢ is 


—1 


® 
(8.3) ch = (we 0)R” (* )) (,&’ 0)R“y. 


The covariance matrix of the Markov estimate is 


pb -1 
Be, ~ ace — 2F («o 0)R™ ( 
(8.4) 0 


(@’(Ru — RuRz Ra) @"). 


By using techniques analogous to those of Sections 4 and 5, one can show that 


(8.5) lim iDy E(ct = ic)(c* —_ 1¢)'1Dy = 2r iu i] ful(—aA) d yM(\) iM, 


Nwowo 
while 


lim Dy E(c% — sc)(c& — ¢1)':Dw 


N+o 


(8.6) -2/ ® faa(—X) 
Qa \J_+ ful(—)fee(—A) — \fi2(— 


Of course cy is the best of all linear unbiased estimates of ,c in that it has the 


—1 
d) 2 dnMO)) ‘ 





ESTIMATION OF REGRESSION COEFFICIENTS 119 


smallest covariance matrix of them all. It is interesting to compare the two esti- 
mates when ,m; = c, a constant. Then 


LM (2) -— ,M(—7) = 1M (0+) = iM (0—) 


so that 


i E(cu — cy’ _ 1 | fi2(0) |? 
n+o H(c* — c)? 


Some aspects of this example have been discussed in [1] from the point of view of 
discriminant analysis. 


9. Processes of dimension higher than two. We shall discuss briefly the 
case of an n-dimensional process y; with stationary residuals, n = 3, and 
indicate that most of the results obtained in the two-dimensional case are 
still valid. Assumptions analogous to those of Sections 2 and 3 are made. Let 

Pi 
(») 
im, = CPt, 
v==l 
be the regression of the ith component wy; of y;. The spectrum of the residual 
process 2; = Yt — mM, is again assumed absolutely continuous with continuous 
spectral densities f;;(A), 7,7 = 1, +--+, n. Of course f;;(A) is the cross-spectral 
density of , and ,,. The determinant |f(A)| of the matrix 


(9.1) f(a) = {fisA); 27,7 = 1,---, 2} 


is assumed to be positive for all A. 
Let 
iC 
and c= 


\iCp: nC 


* * . . 
Let cz and cy be the least-squares estimate and the Markov estimate of c, re- 
spectively, in terms of the observed process y1,---, yw. The matrices 4, 
i= 1,--+,n, are defined just as in Section 2, and we set 


(.® 


R is the covariance matrix of the vector | : }. The expressions given for CL 
nl 
and cy and their respective covariance matrices in Section 1 are still valid. 
The cross-spectral distribution function of the regression vectors of yw, and 
Vt are computed just as in the two-dimensional case. The matrices 





120 MURRAY ROSENBLATT 


uM (x) — ,M(—x) = .M, i= 1,---, 7, are assumed to be nonsingular. Let 
M(aA) = {;M(Q);1,7 = 1, --- , n}. We set 


( Dw 


Let 6;;(A) be the cofactor of the element f;;(A) in the matrix f(A). By using an 
elaboration of the methods of Sections 4 and 5, one can show that 


lim Dy E(c — c)(c= — c)'Dy 


N+ 
(9.2) © 
= 9eM*i | fidk—d) dy MO): 4,5 = 1,--- ,n} mM 


—" 


lim Dy E(ch% — c)(ch — c)’ 


N+o 

(9.3) 1 , ae . 

= 5. 0;;(—A) d;;M(a); i,j = ih, 0% n . 
ot \/—« 
The analogues of the results obtained in Sections 6 and 7 on asymptotic efficiency 
° os . ° ° ° 
of the least-squares estimate c, are valid in the n-dimensional case and can be proved 
in the same way. 


10. Concluding remarks. It would be very interesting to find out for what 
sample size N the various results obtained on asymptotic efficiency of the 
least-squares estimate are practically valid. An effective way to find out 
would be to set up a computational program for the calculation of the co- 
variance matrices of the least-squares and Markov estimates for a variety 
of interesting regressions and spectra f(A). The approximations derived in 
this paper for these covariance matrices should also be computed and com- 
pared with the true covariance matrices. 


‘ . » ° ° vt 
Consider the case of a real two-dimensional process y; = ( v) where both 
24t 


components have constant mean values Ey, = ic, i = 1, 2, which we want to 
estimate. The simplest case of cross-correlation, and a rather uninteresting one, 
is that in which 


© 


cov (Ye ; ir) Otr, 


COV (1Yt, Yr) = pbsr, 





ESTIMATION OF REGRESSION COEFFICIENTS 


where |p| < 1. Nonetheless, it is amusing to note that in this case the least- 
squares estimate is efficient for all finite N and that 


E(ct — c)(ct — c)’ = Elche — c)(ch — c)’ 


2nDxM~ if fo(—d) d sMQ);i,j = 1,2) MDs 


4 


( n 1 
2eDs" | 6:(—d) d M(d); i,j = 1,25 Dy 


) 
w( 1) 
~N\p 1/° 


REFERENCES 

{1] W. G. Cocnran anp C. I. Buss, “Discriminant functions with covariance,’’ Ann. 
Math. Stat., Vol. 19 (1948), pp. 151-176. 

(2] H. Cramir, ‘On the theory of stationary random processes,’? Ann. Math. Stat., Vol. 
41 (1940), pp. 215-230. 

(3] U. GrenanpgER, ‘On the estimation of regression coefficients in the case of an auto- 
correlated disturbance,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 252-272. 

[4] U. GRENANDER AND M. Rosens.uatrt, ‘‘An extension of a theorem of G. Szegé and its 
application to the study of stochsatic processes,’’ Trans. Amer. Math. Soc., 
Vol. 76 (1954), pp. 112-126. 

[5] U. GRENANDER AND M. RosEensBuatt, ‘‘Regression analysis of time series with a 


stationary disturbance,’’ Proc. Nat. Acad. Sci. U.S. (to be published), Vol. 40 
(1954), pp. 812-816. 





AN APPLICATION OF INFORMATION THEORY TO MULTIVARIATE 
ANALYSIS, II 


By S. KuLuBack 
The George Washington University 


0. Summary. Certain results of information theory are applied to some 
problems of multivariate analysis, including the multivariate linear hypothesis 
and the hypothesis of homogeneity of covariance matrices. A discussion of 
certain related linear discriminant functions is also included. Some asymptotic 
distributions on the null hypothesis are derived. Related problems, still under 
investigation, are mentioned. The procedures are based on the principle of 
maximizing information. For the cases considered, the estimates of J(1:2) 
and J(1,2) turn out to be those obtained by replacing the parameters by 
unbiased estimates, appropriate to the hypotheses under consideration. 

1. Introduction. In a previous paper [20], the author considered certain 
results of information theory as applied to multivariate normal populations. 
In particular there was examined the problem of finding the “‘best’’ linear 
function for discriminating between two normal populations, assuming equal 
means but different population covariance matrices. The multivariate analysis 
techniques of discriminant analysis, principal components, and canonical 
correlations were seen to be closely related concepts. (Greenhouse [12], using 
the information-theory approach, has examined the problem of finding the 
“‘best”’ linear function for discriminating between two multivariate normal 
populations, with no restrictive assumptions as to means or covariance 
matrices.) 

In [20], the discussion was in terms of population parameters, and questions 
of estimation and distribution were omitted. In addition to discussing some of 
the problems of estimation and distribution herein, we also want to consider 
further application of information theory, and the relation with previous de- 
velopments, by studying the following four multivariate problems (cf. Roy 
[28], Section 5.1): 

(a) The hypothesis that a k-variate normal population has the covariance 
matrix co; 

(b) The hypothesis of equality of r means for each of k variates for r k-vari- 
ate normal populations with different covariance matrices, and with 
the same covariance matrix; 

(c) The multivariate linear hypothesis, including the case of a subhy- 
pothesis; 

(d) The hypothesis of equality of the covariance matrices of r k-variate 
normal populations. 

The reader is referred to Section 2 of [20] in which the information measures 
are defined and their properties summarized, with particular reference to prop- 


Received March 26, 1954; revised September 8, 1955. 


122 





APPLICATION OF INFORMATION THEORY 123 


erty (iii) on p. 90 of [20]. (Proofs may be found in [22].) Based on the non- 
decreasing property of J(1:2) and./(1,2) for sufficient statistics, we follow a 
principle that may be termed maximizing information in order to attain suf- 
ficiency or near sufficiency. It seems intuitively reasonable that such an ap- 
proach should have certain optimum properties. In certain cases the results are 
closely related to likelihood ratio tests and this relation is under investigation 
for the general case. Asymptotic distributions occurring herein verify conclusions 
derivable from a general asymptotic theory, the details of which are in prepara- 
tion (cf. Wilks [32]), and in two cases a better approximation than the general 
theory provides is derived. 

It might be remarked that appropriate multivariate extensions of the results 
in Cramér [6], particularly pp. 11 and 12, and Daniels [7] also provide an alterna- 
tive basis for a general asymptotic theory. 

There are few general (automatic) procedures for finding test criteria. The 
approach using information theory as a means of determining test procedures 
may be of interest and use because it is, so to speak, an automatic procedure. 

Matrix notation, methods, and results are used and assumed known to the 
reader. The notation, and such results of [20] as are needed, will be used without 
further summary herein. 


2. Components of information. Since /(1:2) and J(1,2) are additive for inde- 
pendent random variables, for a random sample of n observations 7,(1:2) = 
nI(1:2) and J,(1,2) = nJ(1,2), where [(1:2) and J(1,2) are given, respectively, 
by (2.8) and (2.7) of [20]. 

As is well known, the averages and the variances and covariances in samples 
from a multivariate normal population are independently distributed, re- 
spectively, in a normal distribution and in the Wishart distribution (see, for 
example, Wilks [33]). Computing the appropriate values from the respective 
distributions, it is readily found (see, for example, Hoyt [14]), that for the 
averages 
0 (2) 


I (1:2; ) = 3 log 


k 1 mes 1 
: j " 
5 + 2 tomo) + 5 8 a2) 4 


O11) 
| 
5; $e 6 for oa) = oa) = @, 


on] —f % otf —1 —1 
+ tr [(oa) — o@) (oa) — o@)) + 5 8 (oa) + O75 
(2.2) = 
—l. 
nia 6 for ga) = o@ = 9, 


where 6 = ua) — “) , and for the sample unbiased variances and covariances, 


(2.3) 1’(1:2; 8) = "= (og 2@ _k+ tro yi), 


(1) 


(2.4) J'(1,2; 8) = "=" tr (oa — o@) oad, — 07]. 





124 S. KULLBACK 


We thus have from the preceding, 
nI(1:2) = ['(1:2; Z) + I’(1:2; S), 
nJ(1,2) = J’(1,2; Z) + J’(1,2; S). 

3. Estimates of information. The procedure we shall use (replacing popula- 
tion parameters in J(1:2), J(1,2) by unbiased estimates appropriate to the 
hypotheses) is based on a principle of maximizing information, as may be seen 
by the following heuristic discussion. 


Suppose that g2(y) and g*(y) are densities, satisfying the conditions of Section 4 
of [21], such that for given go(y) we require 


*(4 ) 

(3.1) It = [ g*y) log 2 ay( 

g*(y) log ain y) 
to be a maximum, subject to 
(3.2) [ome =1, | ww ev = 
This is equivalent to maximizing 
(3.3) U = |(9*( Le oY * 
3. = y) log © + kg*(y) + lyg*(y) ) dy(y), 


where k and / are arbitrary constants to be determined. The usual variational 
procedures lead to 


(3.4) 6U = 0= | ag) tog 2° g a +1+k+ w | dy(y) 


(3.5) log *(y) 


go(y) 


+1+k+lW= 
This means that 

(3.6) g*(y) = & *""g2(y) 
or, by integration, that 


(3.7) 1=e¢'* / e "g.(y) dy(y) = &**M,(1), 


where we have replaced —/ by ?¢. 
Thus, 


ty ( 
(3.8) g*(y) = cee, Mt) = feo) dy(y); 





APPLICATION OF INFORMATION THEORY 


since this means that 
( * ° g*(y) 
(3.9) I* = | g*(y) log a dy(y) = at — log M(t), 
2\t 


the desired maximum occurs for that value of ¢ which maximizes at —log M,(t), 
or when J* = —log m.(a), in the notation of Section 4 of [21]. (It might be 
noted that k log m.(a) = —kI*, where k is Boltzmann’s constant, is the entropy 
of the distribution whose density is go(y) (cf. Khinchin [18]).) Also, 
* * g*(y) . 
(3.10) J* = | (g*(y) — gely)) log ty) dy(y) = t(a)(a — E,{y)). 
g2\t 

For a simple hypothesis, the parameters of g2(y) are completely specified. 
For a composite hypothesis, say 6 ¢ 6, where @ is a vector of the parameters 
and @ is some subset of the parameter space, we use as the appropriate test of 
the null hypothesis (that relative to g2(y)) the value given by 
(3.11) T = min J* = [*(6) 


626 


and, correspondingly, 
(3.12) J = J*(). 


We shall carry through the foregoing in detail for some cases; in others, we 
shall apply the procedure of replacement of the parameters by unbiased esti- 
mates. 


4. Single sample. Consider problem (a) of Section 1. For a sample of n ob- 
servations from a multivariate normal population with mean matrix yp’ = 
(u1, w2,°** , #k) and covariance matrix o, the moment-generating function of 
the sample averages Z’ = (%,, #,---, %) and V;; and 2V;;, i # J, the ele- 
ments of the matrix V = NS, where N is the number of degrees of freedom and 
S is the sample unbiased covariance matrix, is known to be given by ([33], p. 121) 


(4.1) M.lt, T) = |I — 20T |"? exp (vu + 3’ - ‘), 


where ?’ = (4L,4,°:°:,t), T = (ts), #9 = 1,2,---,k. 

We want to determine g* of (3.9) (the conjugate distribution of Khinchin [18]) 
so as to have the observed unbiased estimates as its parameters, that is to say, 
a = (#, V), which means that we seek the values of ¢ and T which will maximize 
(ef. [18], Section 33) 


ane N = 
(4.2) I* = '@+ tr TI — tu — Ht + Flog [I — 207]. 
U o 
Differentiating with respect to ¢ and T' (see [8], p. 364), we have 


ot 


(4.3) Z—-up- a 0, tr (dT)V — N tr (I — 20T)‘o(dT) = 0, 





126 S. KULLBACK 
from which are derived 

t= no (# — p), 
(4.4) 


=] 1 
T =e — 38S ° 


Using the values given by (4.4), (4.2) becomes 
(45) t=" — poe — wp) + % Cog |o| / [S| — k + tr So”). 


The null hypothesis specifies ¢ but not yu. It is clear that for variations of u, 
I* in (4.5) is a minimum for 4 = &, or 


(4.6) f = min I* = [*(f) = : (log |o| / |S| — k + tr So). 
a ~ 


Note that (4.6) is (2.3) with oq) = S, o@ = a. 

The case of a single sample of n observations from a k-variate normal popula- 
tion was considered in some detail by Hoyt [14]. Hoyt showed that asymptoti- 
cally 27 has a chi-square distribution with k(k + 1)/2 d.f., and to a closer ap- 
proximation, R. A. Fisher’s B distribution. 

In considering tests of significance in factor analysis, Bartlett [4], using a 
“homogeneous” likelihood function, and Rippe [26], using the likelihood-ratio 
test procedure for the problem of tests of significance of components in matrix 
factorization, arrived at the statistic 27 and the same conclusion as to its asymp- 
totic chi-square distribution. 

For the hypothesis of independence of variates, i.e., 


(4.7) o = (o;;), oi = 0, 1 J, 
we may write (4.6) as 


k ’ 
(48) af = —N log |R| +N bP (3: + log @ — 1)], 


where F is the matrix of sample correlation coefficients, or 
(4.9) 2f = 272 + 20s, 


where 27; = —N log |R| (cf. [20], p. 94). Wilks [31] has shown that when (4.7) 
holds, the s;; and r;; are independent, so that 7, and [s are independent. It fol- 
lows from the discussion of Section 9 that, asymptotically, 2T,, has a chi-square 
distribution with k(k — 1)/2 df., and 273 has a chi-square distribution with 
k d.f. It is shown in Section 9 that a better approximation to the distribution of 
27? is given by Fisher’s B distribution (({10], p. 14.665) with 


8 = k(k — 1)(2k + 5)/12N, B® = 22,, m = k(k — 1)/2. 


5. Homogeneity of means. Consider problem (b) of Section 1. We will first 
discuss the case for two samples. Suppose we have two independent samples, 





APPLICATION OF INFORMATION THEORY 127 


having, respectively, nm: , m2 independent observations from k-variate normal 
populations with respective covariance matrices 2,;, 22. We want to test the 
null hypothesis that the population mean vectors are equal, i.e. 


(5.1) 


’ 


Aeipay = way = wy 21, 22, 


with no specification about 2; and 2,. 
Using the notation already introduced in Section 4, we want to determine g*, 


with a = (%q) , Za), Vi, V2), which means that we seek the values of f , T;, 


= 1, 2, which will maximize 


z alo eee ais 
I* = tiyFa) — tym — Hay = tay + te T1Vi + > log | — 22:7; 
ny “ 


(5.2) 
, an , / Ze rn r Nz i - gee 
+ (2) Fe) — tinue — Ft) — te) + tr T2V2 + = log |J — 22,7}. 
Ne o 
Following the procedure as used for (4.3), we find that the sought for values are 
given by 


alli lea 
tay = mi (Eq) — w), te = eX (Ze) — uw), 


(5.3) 


7 »—1 ol 7 »—1 y—l 
TM, = 321 -— 3S, T; = 322 — 382, 
for which values J* of (5.2) becomes 


UL aa rm-l/s N2 /- pelt as 
I* = = (fq — w)'21 (Eq — w) + > Ge — w)'Z2 Ge — uv) 


Z iS i N >> | i al 
+ ‘ (og a —k+trS, z;") e = (Joe iS > zi'). 
2 91) = “ 


2) 


(5.4) 


The null hypothesis requires equality of the means with no specification on 
the covariance matrices. It is clear that for variations of 2; and 2, /* will be 
a minimum for $; = S,, 22 = S:, and for { satisfying 


(5.5) 0 = mSi'(Za — &) + mS2'(Ze — fi) 
or 
(5.6) i= (mSi° + nS") (mS Ea + NaS E(2)). 


For convenience let d = 4 — @, A=mSi, B= nS2'; substituting 
in (5.4) we get 


21 (4, 2 ’ 2) 
" = tr {[B(A + B)'A(A + B)"B + A(A + B)"B(A + B)"A] dd’}. 


But B(A + B)"A = [A(A 7" B)B"y" oe (B" “4. a and A(A + B)"B = 
[B"(A + B)A“J" = (B" + A)”, so that finally 





S. KULLBACK 


tr{(B + A)” dd’ 
d'(B" + A“)"d 


a = = , Si Se - a o 
(Ea) — Fe) = + = (Za) — Z). 
1 2 


It is readily found that in this case J = 2 

For the single variate case, see Fisher [10], pp. 35.174-35.180. 

Linear discriminant function. Consider y = a’x = aid, + aot, + +++ + ante, 
the same linear compound for each sample. Since y is normally distributed, we 
seek a so as to maximize 


a’ dd’ a 
| ay * ie 
a (> + 9) a 
Ny Ne 
As is easily determined (cf. [20], p. 91), the maximum occurs for 


’ Oy —l 
a= (5: aa &) d and 2f'(y) = 2v. 


ny Ne 


(5.9) 21’ (i, $1, 2:;y) = 


r-samples. Suppose we have r independent samples, having, respectively, 
n;,i = 1,2, --+ ,r, independent observations each, from k-variate normal popu- 
lations with respective covariance matrices 2;, i = 1, 2,---,7, and we want 


to test the null hypothesis that the population mean vectors are equal, i.¢ 


om s 


(5.10) He? wa) °° »- ae & “l» 22, _~— 


with no specification about the =; . 
Without repeating the details, we find, in this case, that 


(5.11) I* = = (@ — w)'B7" (Ew — w) + * Ni ¥* (tog '_k+ tr; x"), 
1 & 


r —i r 
(5.12) S ’ i — (= nSi") (> ni Sit) = =. 
t=1 i=l 


(5.13) 2f = dX n(Bu) — £)'Si (Ey — 2). 

On the null hypothesis, 27 has an asymptotic chi-square distribution with 
(r — 1)k df. 

Covariance matrices equal. If we assume that the population covariance mat- 
rices are equal, i.e., that 2; = 22 = --- = 2, = Z, and want to test the null 
hypothesis that the population mean vectors are equal, then, without repeating 
the details, we find, in this case, that 


(5.14) It = OS (ew — pS (Za) — w) + = y (Joe Et — i + tr S> ‘), 


i=l 4 





APPLICATION OF INFORMATION THEORY 129 


where NS = N,S, + --- + N,S,, N = Ni +N2+---+N,, and that 


(5.15) $=S, nm=ni=me+---+nd, n=m+m+---+n,, 


. 
= 7 ni(Fi) — £)'S (Za) — 2) 
t=1 


(5.16) = tr S (nm da) di + Ee + Nr dry dir) 
= tr g* -. 
where diy = Fy — %, S* = Dijuinidi dio (cf. Hotelling [13]). It is readily 
found that in this case J = 27. 
Asymptotically, 27 = J has a chi-square distribution with k(r — 1) d.f. on 
the null hypothesis (cf. [25], p. 372). This will be shown in Section 10. 
Linear discriminant function. Consider y = a’ = ay; + ate +--+ + ont, 


the same linear compound for each sample. Since y is normally distributed, we 
have for the y’s, using (5.16), 


af'(y) = ny(a’ day)* tt ne! de)? 
_ a’(n day diy + +++ + rede dina 


a’ Sa 








a’ S*a 
a a’ Sa , 
with the symbols as defined in (5.16). For the linear compound for which 27’(y) 
is a maximum, the usual calculus procedures yield the result that the a’s must 
satisfy S*a = lSa, where / is the largest root of the equation |S* — 1S| = 0, 
which has (almost everywhere) p positive and (k — p) zero roots, where p S min 
(k, r — 1) (ef. [28]). If we denote the positive roots in descending order as |, , 
l, ee ly ’ 
af=J=trS'’St=h+h+---+1, 
J'(h) + Ih) + +++ + F(l,). 


The discrimination efficiency of the linear compound associated with J; is 
given by 


(5.18) 


(5.19) nae z 
Uh EH FO 


In this case, asymptotically we have, on the null hypothesis, the chi-square de- 
composition (cf. [25], p. 373) 


J'(ly) = lp, |k —-(r —1] +1 de. 
J'(lps) = lpr, |k — (r — 1) + 3 dF. 


7 dh ths «.+6,0 <8 Ob 





130 S. KULLBACK 


This is to be taken in the sense that lmii; + --- + 1, is asymptotically a chi- 
square, not that lmi1, --- , lp have asymptotic independent chi-square distribu- 
tions (see (10.3)). 

EXxAmpLe. Consider the following data from a problem discussed by Bartlett 
and which was also considered in [20], p. 93. (See further references therein.) 
Here,r = 8, K=2, n=mt+-:> +m = 57, 


/136,972.6 sae) ( 12,496.8 oo 
49S = : S* ’ 
Satis 71,496.1 —6,786.6  32,985.0 


and the roots of |S* — JS| = 0 are given by 1, = 44.68667, lk = 3.09106. 
Also, 


J'(k) = 3.09106 6 df. 
J'() = 44.68667 8 df. 


J = 47.77773 14 df. 

Since only J’(l,) is significant, the linear discriminant function y = 22 — 
0.5352, , associated with /, , is affected by the treatments and is practically suf- 
ficient. 


6. Multivariate linear hypothesis. Consider problem (c) of Section 1. Let 
Zy = Yo — BXw, += 1,2, °°: ,n, where Zin = (Za, + » Zits) 
Yio = (ya, *** 5 Yada), Xt = (tay, Zn,), B= (Bre), r=1,2,---, he, 
s=1,2,---,k, ky 2 ke, and the Z, are independent k2-variate normal 
random vectors with zero means and common covariance matrix >. The Y,, are 


stochastic and the X,, are considered known. The usual unbiased estimate of B 
is given by (see [2], pp. 103-104) B = (Y’X)(X’X)™, where 


Y’ = (Ya, Yq@,--:, Ym), X’' =Xw,X@, -++, Xw), 
and that of = is given by (n — k,)$ = Z'Z = (Y’ — BX’)\(Y — XB’) - 
Y’Y — BX’XB’ = Y'Y — (Y'X)(X'X) '(X’Y). 
Let us now consider the hypotheses 

Hy: E\(Y¥ ww) = BX ww ; 
Hz: E2(Y (9) =s 0, 

As in pp. 90-91 of [20], we have 

21(1:2) = J(1, 2) = d’o 6 


(6.1) 


z- 8 


2 


= (XipB’, Xi»B’,---, oe 





APPLICATION OF INFORMATION THEORY 


XwBS BX a + --: + Xi uh TS BX. 


, 


tr > B(XmXw +--+ + XwX iw) B’ 
tr > “BX’'XB’. 
Using the estimates given above, we get as the estimate of J(1, 2) (cf. [20], 
p. 96) 
27(1:2) = J(1, 2) = tr 3 "BX’xXB’ 
(6.3) = (n — ky) tr (Y'Y — (¥’X)(X’X)"(X’Y)) (VX) (XX) (XY) 


, iw ola 
(n — ky) tr So21Sa1Sii Sie 


’ 


where X’X = 7Sn, X’Y = nSp, Y'X = nSyn, Y'’Y = nSx, and 


’ ’ ' wIloao 
Se21 = Se — SaaS 92. 


We may also express J(1, 2) as (n — k;) times the sum of the k. roots (almost 
everywhere positive) of the determinantal equation SuSii Sa — 1Sx i| = 0. 

As in the preceding section, 
(6.4) 2/(1:2) = JQ,2) = (n-—-kh)+h+--- +h,), 
asymptotically on the null hypothesis H,, has a chi-square distribution with 
kyke df. (see Section 10). By replacing S221 by its value as given above, 

fs. — lSoo 1 0 | ey “a r’ Soa), where | => r/(1 — r’), 

The r’s thus defined are Hotelling’s canonical correlation coefficients. (See 
[20], pp. 95-99, and further references therein.) We may also write (6.4) as (cf. 
[20], p. 97) 


(65) 22(1:2) =. ,2) = (n= by) ( Pah ye tet te), 


l—-r il-PrFy l — rj, 


On the null hypothesis B = 0, the results are equivalent to those for the null 
hypothesis that in a k-variate normal population, the set of the first k, variates 
is uncorrelated with the set of the last /. variates, k = k, + ke. The latter 
hypothesis is the one considered in [20], pp. 95-99. 

Linear discriminant function. For the problem of this section, consider w; = 
a Vig = Ya + Vie t+ --: + an,Yu,, t= 1,2,---,n, the same linear 
compound of the y’s for each observation. Since the w’s are normally distributed 
with o., = a’Za, we have for the w’s 

. (a’BXqa)* + +++ + (a’BX qm)” 
21'(1:2; w) = J(1,2; w) = SEE aE neiatite ates = ite - 
a La 
a’ B(X a) Xa) + +++ + Xow Xin) B'a 
a’ BX'XB'a 


a’Sa 





132 S. KULLBACK 


To find the linear compound for which J’(1,2; w) is a maximum, the usual 
calculus procedures yield the result that the a’s must satisfy BX’XB’a = 2a, 
where dQ is the largest root of the equation |BX’XB’ — Z| = 0. Denoting the 
kz positive roots in descending order as \1, A2, -** , Ake; 


27(1:2) = J(1,2) = tr > "BX'XB’ = +rt+-:: +A, 
= J"(1,23 1) +--+ + J’(1,2: Nay): 
Using the estimates as in (6.3) and (6.4), we have 
2f(1:2) = J(1,2) = (n—hb)atht+--- +h,) 
= J’(1,2;h) + J’(1,2; le) + +> + J(1,2; ly). 


(6.7) 


(6.8) 


The canonical correlations enter as before. 
In this case, too, asymptotically on the null hypothesis H,, we have the chi- 
square decomposition (see Section 10) 


J'(1,2; ,) = (n — ki), = (n — ky)ri,/(1 — rig) ki — ke +1 df. 


J'(1,25 leg 1) = (n-—- ky), = (n— ki)ri,—1/(1 " 


(n — ky)ri ai —- ri) k; + ke -_ 1 dd. 


ke ke a 
J0,2) =n—k) Di =m—-kh) ER/ a7) kike df. 
i=] 


i=] 


This is to be taken in the sense that (n — ky)(lmsi + --- + k,) is asymptot- 
ically a chi-square, not that (n — kijlmai, +--+: , (nm — ki), have asymptotic 
independent chi-square distributions. (See (10.4).) 

EXAMPLE. By way of illustration, we use the data already discussed by Hotel- 
ling and the values derived in [20], p. 98, where it was found that rj = .1556, 
re = 0047, ri/(1 — ri) = .1843, r3/(1 — rz) = .0047; and since n — k, = 
139 — 2 = 137 (there were 140 observations but the values were computed 
about the sample averages), 


J°(1 2: ve) = .6439 1 d.f. 


J'(1 2:71) = 25.2491 3 df. 


J(1,2) = 25.8930 4 d.f. 


Since only J’(1,2: r1) is significant, the linear discriminant function associated 
with r,, w= —2.4404 y, + ye, is the only such linear function and is prac- 
tically sufficient, confirming the inference made in [20], p. 99. 

Subhypothesis. We return to the problem at the beginning of this section and 
separate the k, z’s into two sets of gq andq2, ki = qi: + g2. Witha corresponding 
partition of the matrix B, we now have Z;, = Yr. — CXay — DXes , where 


X( = (Zoo), B = (C,D), where C and Dare, respectively, ke X q: , ke X qe 


at) 





APPLICATION OF INFORMATION THEORY 133 


matrices; or Z’ = Y’ — CX; — DX}, with Z and Y as previously defined and 


_ , ; ; X ay X can) Xi 
XxX’ = (Xq,Xe@, — » Xin) _ ¥ _ - . - aa 
X (21) X (an) Xo 


With the same assumptions as to the Z,,, , we now consider the hypotheses 
Ay: Ey\(Y ww) = CX ay + DX ei ’ 
(6.9) : 
Hz: E(Y a) a CX ay + D2X (2) , 
Applying the same procedures as previously, it is found that now 


21(1:2) = J(1,2) 


\ 


(6.10) : y 
Su Sy 


{ (C, — @)’\) 
= tr 2! 4 ((C; — C2), (Di — De)) f> 
| Sa See (D, — D:)’ J 


, at. XiXi XiX: Su Sw 
zs = ) (X, X2) = ’ oe oe = ° 
Xo Xo Xi Xo Xe Sa So 
In particular, we wish to test the null hypothesis that D, = 0. For C; and D,, 


the estimation procedure previously used for B, [2], yields here 


’ ’ 
Su S 12 
’ 


(6.11) (C,, Dy) ( 


) = (Y'X1, Y’X2), 


Soa See 


CiSu + DiSn = Y’X,, 
(6.12) 2 * sete 
CiSy + DiS2 = Y'Xs ; 
and for C:, 
(6.13) CSu = ef € ° 

From (6.12) it is readily found that 

(6.14) Dy, = Y'Xe,Se1, Ci = Y’XiSn — DiSaSu , 
where X21 = X2 — XiSiSiw, Seer = See — Soy Sir Sis . 


For the estimate of =, we have as before 


a : SuSw\ /C; 
(6.15) (n — ky)S = Y’'Y — (C,, Dy) a 
Sa Sao Dy 





134 S. KULLBACK 


and for the estimate of J(1,2), 
eo o cit ‘ . Si: Sie (C, - C,)’ 
(6.16) J(1,2) = tr 2° 4((Ci — C,), Dy) , Re 
{ Sor S22 D; 


Using the values given in (6.13) and (6.14), it is found that 


: SuSw\ (C1 oa aa — 
(C1, Di) i .,) = C28Su€2 + Dy Sx. D; 
So Se/ \Di 


(6.17) = Y’XSiXiY + Y’Xe1Si21 X21 ¥, 
SuSwz\ /(€, — C2)’ pie ae 
((C, — C:), Di) —- = Di Seid. 
Soi S22 Dy 
It is readily verified that 


(6.18) XSi X1X01S21X01 = 0, 


° off - i 
and since Xo:Xe1 = See, 


(6.19) (I — XiSi§'X1 — X21Se91X2.1)X21821X21 = 0; 


that is to say, the two factors in /(1,2) are independent. 
These results are summarized in Tables 1 and 2. 
We omit a discussion of linear discriminant functions for this case. 


7. Homogeneity of covariance matrices. Consider problem (d) of Section 1. 
For its special interest, we consider first the case of two samples and then the 
general case. 

(7.1) Two samples. Suppose that we have two independent samples with m 

TABLE 1 
Due to df Generalized sum of squares 


(es 
Difference 
Difference 
Total 
TABLE 2 
Test Asymptotic distribution on the Null hypothesis 


tr >“ BX'xXB’ chi-square k, kz df. 
tr = ‘D, Sars D; chi-square qe2ke df, 








APPLICATION OF INFORMATION THEORY 135 


and m2 independent observations, respectively, from k-variate normal populations 
for which we make no specification about the means, and suppose that for the 


population covariance matrices we have the two hypotheses, H,:2; 2 and 
He:2; = 22 2. 

Using the notation already introduced in Section 4, we want to determine g* 
with a = (#q) , 2), Vi, V2), which means that we seek the values of ty , T:, 
t 1 ,2, which will maximize (cf. (5.2)) 

iS ~ t 1,/ 2 m <r N, 901 
= lia) — layua — olay - ta + tr 71 Vi + = log |J - 2=7;| 
(7.1.1) r 


= =< N2 sins 
+ tanto — toe _ Alia, te) + tr T2V2+ 5 log |J — 2>7;|. 
Ne 2 


Following the procedure as used for (4.3), we find that the sought-for values 
are given by (cf. (5.3)) 


s—ls = sls - 
opti tay = mz (Za) — ua), tay = ME (Za) — ue), 
(7.1.2) 
1” | 1 y—1 —1 
T; =_= em — 3 Si 9 T» = 41> — 1 S2 . 


for which values J* of (7.1.1) becomes 


Ne 
29 


s-l 


% /. mn ” a Ve 
I* = = (Za) — wa)’'2” (Eq — wm) + > Gem — we) 2 (Eq — ve) 


(7.1.3) 
+ (1 , 2 —k+trS x) 4% lo [2 | —k+trS =) 
ro 8 ig) — “ sta 2 \' is, Pee 


For variations of ua , u@ , and 2, J* will be a minimum for fq , @@ , and 2 
satisfying (see [8]) 


a-1/ - A é-ls - A 
m2 (Fa — ka) = O, ne” (Ze) — fe) = 0, 
ny - A 4g=-1 sQ—1/ - A 
J=<-—=s (Za) — Ba) 2 ddz~ (Za) — fay) 
(7.14 Ne; a ent sent a ee 
— = (Z — be) 2 dzz (tq — be) +> tr2 dz 

Ni , 4 weet ies. 6-6 te N2 . Ont dt 

-=tsi2 d>= + = & y d= — > trS:2 me 


~ = _ 


from which we find that 
(7.1.5) fw = Za, fa = Za, (Ni + N.)E = NiS: + NoS: = NS, 
where N = N,; + N2 ; consequently (cf. Wilks [31], p. 489), 


yo a boi 2 bh il 
(7.1.6) 21 = N, log |=’, + N2 log —. 
IS; S 





136 S. KULLBACK 


It is readily found that the corresponding J is given by (ef. [20], p. 91) 


> N, Nz 
(7.1.7) J = - 
2(Ni + No) 

It will be shown in Section 8 that 2/, for large Ni and Nz, on the null hy- 
pothesis H; , has a chi-square distribution with k(k + 1)/2 d.f., and to a better 
approximation, a non-central chi-square distribution, R. A. Fisher’s B distribu- 
tion. 

Linear discriminant function. We seek a linear compound, the same for both 
samples, y = a’% = at + aete + --- + axx,, Which will maximize (see 
(7.1.7)) 

(7.1.8) PG) = site. (See + St - 2, 


2(N, + N2) aS» a a’ Sia 


(tr S,S3" + tr S; sr" — 2k). 


The usual calculus procedures yield the result that a is obtained as a solution 
of Sia = F Sea, where F is a root of the determinantal equation |S, — FS,| = 
|NiS,; — IN2S2| = 0, and F = N2l/N,. It is found that the same linear func- 
tion results from maximizing (see 7.1.6)) 


scr yo 
a’ Sa a’ Sa 


N, N2 
(7.1.9) I’(y) = — log + — log ; 
oa 2 a’ Sia 2 a’ Soa 

If the roots of the determinantal equation, which are almost everywhere posi- 
tive, are F,, F., --- , Fy arranged in ascending order, then, as was shown in 
[20], Section 5, the maximum of ./’(y) occurs for the linear compound associated 
with F; or F; according as F,F; < 1 or F;F; > 1. 

It may also be shown, readily, that 


= I’(l) + I’ (le) + si + I'(l.), 


(7.1.10) ie 
—, ”,) 


[ 
J: 


where 


NiN2 
2(N; + No) F, 
It is conjectured that for large N; and N2, (assuming that the corresponding 
population parameters have null hypothesis values) the quantity 2I' (ls) + 
+ 27'(l,), the terms arranged in descending order of efficiency, has a chi- 
square distribution with (k — m)(k — m + 1)/2 df. 


J'(F;) 





APPLICATION OF INFORMATION THEORY 137 


(7.2) r-samples. Suppose that we have r independent samples, respectively, 
of N,, Ne, ---,N,, independent observations each, from k-variate normal 
populations for which we assume the means equal, and that for the population 
covariance matrices we have the two hypotheses, H;:2,, 22, ---, 2, and 
H2:2; 2 = --- = 2, = &. Thus, for the r samples, corresponding to MH, 
and H; , we have, respectively, 


Dh . 0 |N, 
21 
" . |Ne . . . . 
(7.2.1) o@ -. : S N=N,+N.2+- +» N.. 
~ 
0 see Be Ne 
> ” 0 Ni 
i. kos -iNe 
(7.2.2) G2) = . eee Ss 
0--- 2/ N, 





(1:2) = 1 tr : 
(7.2.3) \ @---Z—z/ \o- = 
“NN, \>| a kN 
= EF (los > +tr2z,;z ) - > 
>:::0 >---Q 
JQ,2) = trp fie. f Jap toe: 
QO «-. Z, Os. J 
z= 0 Zr 0 
(7.2.4) f : J—|: : 
0 z=" 0 --- ZF 
—~ N at 1 J 
= 2 > (trXs2” + tr=zi) — KN. 
iml & 





138 S. KULLBACK 


If we compute the sample values about the sample averages, then we estimate 
T(1:2) and J(1,2), by taking Ni, Nz, ---,N,, as degrees of freedom, and 
replace 2,, 22, ---,2,, respectively, by the sample unbiased covariance 
matrices S,, S2,---,S,, and = by S, where NS NiS; + --- + N,S,. 
Thus we have (cf. Box [5] and Wilks [31], p. 489), 


Ni ad \S| kN 
(1:2) = 5: ( tr SiS ‘+ log Sl) - — 
t=l1 ; Wi 


) 


=> = (tr S;S* + tr SS;") — kN = >> 2 
ra 2 t=l 4 


kN 
2 


(7.26) 


N;N; _— pd 
= y ON (tr S;S;* + tr 8;S;" —_ 2k). 


We omit at this time a discussion of linear discriminant functions for this case. 


8. Asymptotic distribution of /(1:2) for the homogeneity of covariance 
matrices. On the hypothesis Hz of Section 7.2, we let 


(8.1) NS; = 2°V;2"" NS = 2"v>"", 


’ 
These equations define transformations linear in the elements of the matrices 
S;, S or V;, V. The Jacobians of these transformations are given by [8], 


(k+1)/2 (k+1)/2 
and 


The Wishart distributions of the elements of S;, S are thereby transformed into 
the respective probability densities of the elements of V;, V, given by 


kN,;/2 1/2)tr¥ y \(Ng—-k—1)/2 kN/2 —(1/2)trv r\(N—k—1)/2 
(gyrtetg EN iy eo (g)trrer emer iy ) 


_to-pia ll . E t a 3} 2 aii II . - + 1 - «) 
a=l 


a=l 


(8.2) 


Applying the transformations in (8.1) to [(i:2) in (7.2.5), we get 


r 


(8.3) T(1:2) = r B ae (1og od. + k log <): 


p=1l 


Since the r samples are independent, the characteristic function of the distribu- 
tion of 


r 


2 Nz log i! = N log V —_ - Nz log Vz 
B= 8 


ae | 








APPLICATION OF INFORMATION THEORY 139 


is given by (cf. Box [5], p. 321) 


r kNg/2 —(1/2)tr Vg V (N g(l—2it)—k—1) /2 i 
old) ~ (a # - | 3 ro II dV ays 


hei II r (* + ~ *) B=1 7,8—=1 


(8 — 2it) ++1-— *) 
ee 
-| a ca 
a=l ,(Netl—a 

p (eta ) 


k 
KN /2 —(1/2)trv ) yx) (NC1—28t)—k—1] /2+-Nit , 
(3)""e \V| “TI avy. 


a=l 


(8.4) 


2 


I 
( 


a=l 


2 
a IP antaisatechipier uiaieeitintiamadleene 
| gha-wis II r (eS — 2) +1-—- *) 
' 





N(l — 2#) +1 — *) B—1 . (Met 1 — *) 
9 


a=] 


2 
where the middle result follows from the reproductive property of the Wishart 
distribution [33]. We will use Stirling’s approximation 
log I'(p) = } log 2x + (p — 4) log p — p + Yep — shop’ + O(1/p') 
to get an approximate value for large Ng in (8.4). We have that 


. (“a — 2it) +1—- *) 


9 


_ 


log ——__—__—_ = 


r Ng + l — =) 
2 


N,(1 — 2it) — a los Ng(1 — 2it) +1—a Ng(l — 2t)+1—a 
——________ - 5 a —_— ae 


~ 











2 2 2 
(8.5) + _ i ie 
- 6(Na(l — 2a) +1 —a)  45(Na(l — 2a) +1 — a)? 
pe Ns — log Ne + 1 — a@ 
. 2 
Nes + 1 —_ @ 1 l 75) 
—— = - a O(1/Ns), 
. 2 6N,+1-—a)' BN,+1—a* (1/Ns) 
and after some algebraic manipulation, the right member of (8.5) may be written 
as 
Ne. Ndl=- 2) — ; an 
—itNs log S + Se log (1 — 2it) + Ngit 


(30° — 1)2it 


+ TON, — Dit) 


+ O(1/N3). 





140 S. KULLBACK 


We therefore have 


k T ° 
log ¢(t) = 2) (itv to l og > _ NG = 2it) — @ a — ain) 


a=] 2 


_ WN; (30° — 1)it y ?)) 
Nit — 6N( — 2it) — 0(1/N’) 


NV — a 
+ > o - any ng f+ FAA MS «og (1 — 2it) 
a=1 p=1 “ 

(30° “= De 
6Na(1 — 2zt) 
(r — 1I)k(k + 1) 

4 


+ Ngit + + 0(1/N5) 


= —it 2 kNg log x o log (1 — 2izt) 


— as 3k* — k) 
= — 2it) (2 x, ae 


uy + >> 0(1/N3) — 0(1/N’). 
B=1 


Neglecting the last term in (8.6), we have that 


(8.7) ¢(t) = (1 Loe 2it)~ (r—1L)k(k+1)/4 exp (-i> » kNg len a es ee 


where C = (2k* + 3k? — k)( 90521 1/Ns — 1/N)/12. 


Because of (8.3) and (8.4), writing ¢ = 27(1:2), the probability density of ¢ 
is given by 
00 —tte+Cit/(1—2it) 
(8.8) Dg)=—| = = 


2 9 


= a Dit) DED * 


If we neglect the term with C, it follows that D(¢) is a chi-square distribution 
with (r — 1)k(k + 1)/2 df.; otherwise, by integrating (8.8) (see [23], p. 86), we 
get, since ¢ is real and positive and (r — 1)k(k + 1)/4 > 0, 


(n—1)/2 
(8.9) D(x) = orn (f) T,(VCs), 


where n = (r — 1)k(k + 1)/4 and J,1(+/Cf) is the Bessel function of purely 
imaginary argument [30] 


i yornnes CE ea 
Ina(s/Ct) = A: + Se 
j=0 


ji (n + 9) 


The distribution given by (8.9) is the non-central chi-square distribution and is 
Fisher’s B distribution ({10, p. 14. 665) if we writeC = 6°, ¢ = B’, In=mn. 
The case for k = 1 is the Bartlett test for homogeneity of variance [3], [5]. 
The approximation to the logarithm of the characteristic function of £, i.e., 
—n log (1 — 2it) + cit/(1 — 2it), corresponds to that of Box [5], formula 29, 





APPLICATION OF INFORMATION THEORY 141 


p. 323, retaining only the first term in his sum; i.e., (a;/u)[1/(1 — 27) — 1] 
(there is a misprint in the formula) is Cit/(1 — 2it) as used here, as may be veri- 
fied by using the appropriate formulas with 8 = 0 on pp. 324-325 of [5]. 

For large n we may approximate I,_:(+/C¢) in (8.9) by writing 


/ n—1)/2 « j 
Tna(o/R) = LEELA) 5 (CS/4) TC) 


I'(n) j—0 jir(n + j) 


“—_ (C¢/4)°-?” Cs) 1 cry 
fa I'(n) j—0 J! \4n 
wi (Qj? ™ crim 

- I'(n) ‘ 


and thereby get 


—C/2—§ [1—(C/2n)} /2 


(8. wl Seema te 
8.10) Dg) = 3 Fn) (: 


If we set ¢{{1 — (C/2n)] = x’, (8.10) yields 


? e—cl2 e x*/2 x’ n—1 
(y* | Py <a x) 
Dix) dx 1-£) at €; 
2n 
37/2)" dx? /2 
I'(n) ; 
or ¢{{[1 — (C/2n)] asymptotically has a chi-square distribution with 
2n = (r — 1)k(kK + 1)/2 df. 


It is readily verified that 1 — (C/2n) = p, Box’s scale factor in the chi-square 
approximation ((5], p. 329). 

For other approximations to (8.9), see Abdel-Aty [1]. 

EXAMPLEs: (a). For the first example we use the data given by Smith [29], 
Table 2, which he used to calculate a linear discriminant function for a group of 
25 normal persons and 25 psychotics. Here k = 2, r = 2 


“, _ 


g _( 6.92 ‘ae g _ (36.75 ae g _ (21.83 rod 
eo (_ 5.97 40.89)’ “2 ~ \13.92 287.92 ~~ \ 4.33 164.407” 
aes 


Ni = N2 24, N = 48 


: 1S:| = 255.1859, |So| = 10387.2936, 

|S} = 3570.1031, 
24 log (3570.1031/255.1859) + 24 log (3570.1031/10387.2936) = 37.7268, 
(16 + 12 — 2)(2/24 — 1/48)/12 = .135416 = 6°, = B = 368, 
(2 — 1)(2)(3)/4, 2=3=™m, 
of = 37.7268 = B’, B= 6.14. 





142 S. KULLBACK 


In Fisher’s B Table ({10], p. 14.665) we find the 5 per cent points for m = 3 and 
8 = .2 and .4 to be, respectively, 2.8140 and 2.8680. We therefore reject the null 
hypothesis of equality of the population covariance matrices. Smith [29] does 
remark that the correlations are not significant, but the variances of the psy- 
chotics are significantly greater than those of the normals. 

(b) For the second example, we use the data given by Kossack [19] for a 
problem of classifying an A.S.T.P. pre-engineering trainee as to whether he would 
do unsatisfactory or satisfactory work in his first-term mathematics course. 
The three variables used are x; , a mathematics placement test score; x2 , a high 
school mathematics score; x; , the Army General Classification Test score. There 
were 96 trainees who did unsatisfactory work and 209 who performed satisfactory 
work. Herek = 3, r=2, Ni = 95, Ne = 208, N = 303, 


133.8592 7.0572 2.0717 217.1505 14.0692 35.7085 
S; 7.0572 4.1288 -—2.0109}], S: =| 14.0692 3.9820 4031 
2.0717 —2.0109 27.7016 35.7085 4031 72.7206 


191.04 11.871 25.162 

11.871 4.0280 —.35378],  |S,| = 13313, — |Ss| = 43779, 

25.162 —.35378 58.606 

34053 34053 
208 | ~ 

13313 1 208 !°8 a3779 


C = (54 + 27 — 3)(1/95 + 1/208 — 1/303)/12 = 078221 = 8, 8B 
n (2 — 1)(8)(4)/4, 2n=6=n, 
t = 2f = 227.0867 = B’, B= 15.06. 


In Fisher’s B Table ([{10], p. 14.665) we find the 5 per cent points for n; = 6 and 
8 = .2 and 4 to be, respectively, 3.5602 and 3.5951. We therefore reject the 
null hypothesis of equality of the population covariance matrices. An assump- 
tion of equality is, however, implicit in the procedure used by Kossack. 

i(c) For the third example, we use the data given by Pearson and Wilks [24], 
for five samples of twelve observations each on the strength and hardness in 
aluninum die-castings. Based on their data (note that they did not use the un- 
biased estimates), the details of which are not repeated here, 


34053, 27 = 95 log 


= 227.0867, 


k=2, r=5, M=--- =N;=11, N = 55, 
log |S;| = 5.82588, log |S.| = 6.63942, log |S,| = 5.31904, 
log |S,| = 6.66973, log |S;| = 5.35937, log |S| = 6.13953, 
55(6.13953) — 11(29.81344) = 9.726, 
(16 + 12 — 2)(5/11 — 1/55)/12 = 945454 = 6’, B = .972, 
(5 — 1)(2)(3)/4, 
2f = 9.726 = B’, 





APPLICATION OF INFORMATION THEORY 143 


In Fisher’s B Table ({10], p. 14.665) we find the 5 per cent points for n; = 7 
(the largest there tabulated), and 6 = 0.8 and 1.0 to be, respectively, 3.9144 
and 4.0005. Since the tabulated values increase with increasing m for a fixed 
8, we do not, in this case, reject the null hypothesis of equality of population 
covariance matrices. This is consistent with the conclusion reached by Pearson 
and Wilks [24]. 


9. Asymptotic distribution of 7; . In (4.9) we defined 7, and made certain 
statements about its asymptotic distribution which we will -now confirm. 

It is known that the logarithm of the characteristic function of the distribu- 
tion of 27% is given by (see [31], p. 492; [4]) 


(2) To 2it) — 4 
2 2 
a 


AN . a=l (N - «\ ic 


Employing Stirling’s approximation as in (8.5), and retaining comparable terms 
as in (8.7), we have 


(9.1) log o(t) = (k — 1) log 


k(k — 1) ’ 
(99 . ee ae ins i 
9.2) log o(t) i log (1 — 2it) + 1 — on’ 
where C = k(k — 1)(2k + 5)/12N. 
The statement at the end of Section 4 then follows from (9.2), (8.8), and 
(8.9). From (8.11) we may also deduce that 


9 k(k ee) | 
2 ~) = —(N — 4(2k + 5)) log |R 
I (1 6NKk = 1) ( 4(2k + 5)) log |R| 
asymptotically has a chi-square distribution with k(k — 1)/2 d.f. This latter 
result is given by Bartlett [4]. 


10. Asymptotic distribution of (1,2) for the linear hypothesis. From results 
derived by Fisher [9], Girshick [11], Hsu [15], [16], [17], and Roy [27], it is known 
that the probability density of the distribution of the roots of |S* — IS| = 
(see (5.18)), for (n — r) large, is given by 


(r—-1)p/2 /2 \ (r—p—2) /2 
(4) P92 79(), .. . 1)” 


—$( I y+++++1y) II (1; aay l 
— -t— 

(10.1) P r-—a\.(pt+i-a at 
Ary) 


and that of the roots of |SxSi' Si — S221) = 0 (see (6.3), (6.4)), for (n — ki) 
large, is given by 





y)fita/2 kal27y7 ; (ky—ke—1)/2 
(10.2) _ 2 lk, + ima a 1 eKreetrm) TT (V, — Vo, 
4 1 Co —a i>j 
Itr(B+j}—*)r(&+}-*) 


where V; = (n — ky). 








144 S. KULLBACK 


From (10.1) and (10.2) it is readily derived that the characteristic functions 
of the asymptotic distributions of 27(1:2) = J(1,2) in (5.18) and (6.4) are, 
respectively, (1 — 2it)“~””? and (1 — 2it)*"**”, whence the conclusion as to 
their chi-square distributions. The chi-square decompositions in Section 5 and 
Section 6 follow from the fact that asymptotically the distributions of 


bests *** o & 


of (10.1) and Vin4i, --- , Ve, of (10.2), assuming that the corresponding popula- 
tion parameters have the null hypothesis values, are independent of the distri- 
bution of the remaining roots and are given, respectively, by 

ial, aaa 


gf (r—m—a p—-m+l—a 
on Be) 


ae Sis l,) r—p—2) 24 lm ite ++ tl») II (1, a l;), 


1 \(ki—m) (kg—m)/2_(kg—m)/2 
(3) 7 


a k—-m+l—-—a ke - m - di a 
aos) IT r(2a me ~~ ) r( 5 ) 





» 
~ 


| ee Vi.) k\—ko-1 2 Vin +it---+VE,) II (Vv; aA Vi). 


The characteristic function of the distribution of 27 of (7.1.10) could also 
have been derived from the distribution of the roots of |N,S, — IN2S,| = 0, 


given by, 
ee as) 
wt?! TT — 2 


7 (Hbia a), (Maia 


(, --- h)%*?* TT G - ly) 
t>) 
(1 +h) +++ (1+ 4))%"?? 

11. Concluding remarks. The validity of the conjecture at the end of Sec- 
tion 7.1 is under investigation, as well as the distributions of J and J’(F,) of 
Section 7, and related power functions. 

It might also be mentioned that we have a basis for assessing the cost of trad- 
ing observations for dimensions. If there is more than one significant linear dis- 
criminant function, then N; observations with the linear function associated with 
1 (one dimension) would be as effective as N observations with the original 
multidimensional variables, where NJ(1,2) = N,J’(1,2; A1). Similar conclusions 
hold for more than one linear function. 

Procedures similar to those used herein to estimate J(1:2) and J(1,2) are 
also applicable to problems of testing appropriate hypotheses for other than 
normal populations. 








APPLICATION OF INFORMATION THEORY 145 


I am indebted to the referees for comments which have contributed to improve- 
ments in this paper. 


(9] 
(10) 
(11) 


») 
{12 


[13] 


REFERENCES 


S. H. Appre.-Ary, “‘Approximate formulae for the percentage points and the proba 
bility integral of the non-central X? distribution,’’ Biometrika, Vol. 41 (1954), 
pp. 538-540. 

T. W. Anpverson, ‘“‘The asymptotic distribution of certain characteristic roots and 
vectors,’’ Proceedings of the Second Berkeley Symposium on Mathematical Sta 
tistics and Probability, University of California Press, 1951, pp. 193-130. 

M. 8S. Bartuert, ‘Properties of sufficiency and statistical tests,’’ Proc. Roy. Soc. 
London, Ser. A, Vol. 160 (1937), pp. 268-282. 

M. 8. Bartuert, ‘‘Tests of significance in factor analysis,’”’ Brit. J. Psychology, Stat. 
Sec., Vol. 3 (1950), pp. 77-85. 

G. E. P. Box, ‘A general distribution theory for a class of likelihood criteria,’’ Bic 


| H. 


| H. 


R. 


R. 


M. 


H. 


metrika, Vol. 36 (1949), pp. 317-346. 

Cramér, “Sur un nouveau théortme-limite de la théorie des probabilitiés,’’ 
Actualités Scientifiques et Industrielles, No. 736, Hermann & Cie, Paris, 1938. 

E. DANIELS, ‘‘Saddlepoint approximations in statistics,’? Ann. Math. Stat., Vol. 25 
(1954), pp. 631-650. 


. L. DEEMER AND I. OLK1N, “‘The Jacobians of certain matrix transformations useful 


in multivariate analysis,’’ Biometrika, Vol. 38 (1951), pp. 345-367. 

A. Fisuer, ‘‘The sampling distribution of some statistics obtained from non-linear 
equations,’’ Ann. Eugenics, Vol. 9 (1939), pp. 238-249. 

A. Fisuer, Contributions to Mathematical Statistics, John Wiley & Sons, Inc., New 
York, 1950. 

A. Grrsuick, “On the sampling theory of the roots of determinantal equations,”’ 
Ann. Math. Stat., Vol. 10 (1939), pp. 203-224. 


. W. Greennovse, “On the problem of discrimination between statistical popula- 


tions,’’ M. A. Thesis, The George Washington University, 1954. 

Hore.uina, ‘‘A generalized T test and measure of multivariate dispersion,’ Pro 
ceedings of the Second Berkeley Symposium on Mathematical Statistics and Proba 
bility, University of California Press, 1951, pp. 23-41. 


. P. Hoyt, “Estimates and asymptotic distributions of certain statistics in informa- 


tion theory,’’ Dissertation, The Graduate Council of The George Washington 
University, 1953. 


. L. Hsu, “On the distribution of roots of certain determinantal equations,’’ Ann 


Eugenics, Vol. 9 (1939), pp. 250-258. 


. L. Hsu, ‘On the limiting distribution of the canonical correlations,’’ Biometrika, 


Vol. 32 (1941-42), pp. 38-45. 


. L. Hsu, ‘On the limiting distribution of roots of a determinantal equation,”’ J. 


London Math. Soc., Vol. 16 (1941), pp. 183-194. 


. I. Kuincutn, Mathematical Foundations of Statistical Mechanics, Dover Publica 


tions, New York, 1949. 


. F. Kossack, ‘‘On the mechanics of classification,’’ Ann. Math. Stat., Vol. 16 (1945) 


pp. 95-98. 
Kuuupack, ‘An application of information theory to multivariate analysis,” 
Ann. Math. Stat., Vol. 23 (1952), pp. 88-102. 


. Kuuuipack, “Certain inequalities in information theory and the Cramér-Rao in- 


equality,’”’ Ann. Math. Stat., Vol. 25 (1954), pp. 745-751. 


. KULLBACK AND R. A. LErBLER, “On information and sufficiency,’”’ Ann. Math. Stat. 


Vol. 22 (1951), pp. 79-86. 











8S. KULLBACK 


. W. McLacuian, Complex Variable and Operational Calculus with Technical Ap- 
plications, Cambridge University Press, 1939. 

{. S. Pearson anv §S. 8. Wixiks, ‘Methods of statistical analysis appropriate for k 
samples of two variables,’’ Biometrika, Vol. 25 (1933), pp. 353-378. 

. R. Rao, Advanced Statistical Methods in Biometric Research, John Wiley & Sons, 
Inc., New York, 1952. 

. D. Rippsg, “Statistical rank and sampling variation of the results of factorization 
of covariance matrices,’’ Doctoral Thesis on file at the University of Michi 
gan, 1951. 

‘. Roy, ‘‘p-statistics, or some generalizations in analysis of variance appropriate 
to multivariate problems,’’ Sankhya, Vol. 4 (1939), pp. 381-396. 
. Roy, “On a heuristic method of test construction and its use in multivariate 
analysis,’? Ann. Math. Stat., Vol. 24 (1953), pp. 220-238. 
. B. Smitu, “Some examples of discrimination,’’ Ann. Eugenics, Vol. 13 (1947), 
pp. 272-282. 
. Watson, Bessel Functions, (2d ed.), The Macmillan Co., New York, 1944. 

. S. Wixks, ‘‘Certain generalizations in the analysis of variance,’’ Biometrika, Vol. 
24 (1932), pp. 471-491. 

. S. Wixks, ‘‘The large sample distribution of the likelihood ratio for testing com- 

posite hypotheses,’’ Ann. Math. Stat., Vol. 9 (1938), pp. 60-62. 
S. 8. Witks, Mathematical Statistics, Princeton University Press, 1943. 





ON THE CHARACTERISTICS OF THE GENERAL QUEUEING PROCESS, 
WITH APPLICATIONS TO RANDOM WALK! 


By J. Kierer AND J. WoLFrowIrTz 
Cornell University 


Summary. The authors continue the study (initiated in [1]) of the general 
queueing process (arbitrary distributions of service time and time between 
successive arrivals, many servers) for the case (p < 1) where a limiting distribu- 
tion exists. They discuss convergence with probability one of the mean waiting 
time, mean queue length, mean busy period, etc. Necessary and sufficient con- 
ditions for the finiteness of various moments are given. These results have 
consequences for the theory of random walk, some of which are pointed out. 

This paper is self-contained and may be read independently of [1]; the neces- 
sary results of [1] are quoted. No previous knowledge of the theory of queues 
is required for reading either [1] or the present paper. 


Introduction. We recapitulate very briefly some of the results obtained in [1] 
in the notation of [1] to which we shall adhere without further mention.? 

Let S be the totality of points (1, z2,---, 2.) of Euclidean s-space such 
thatO S a S a7. S--- S a,. Let x and y be generic points of S. Occasionally 
another letter will represent a point in S; it will always be clear from the con- 
text when this is so; for example, O will frequently denote the origin in s-space. 

For i = 1, let t; = t& = O be the time of arrival of the 7th person at a system 
of s = 1 machines, where he waits his turn until a machine is available to serve 
him, say at time ¢; + wa 2 ¢t;. This machine is then occupied by him for time 
R; = 0. Let g; = t; — t-1. {R;} and {g,} are independent sequences of identi- 
cally distributed and independent chance variables. An s-dimensional random 
walk {w,;}, with wa its first component, is useful for the study of the theory of 
queues. The random walk {w,} is constructed as follows: w; = (wa,---+ , Wis). 
Unless the contrary is explicitly stated we have w; = O. To obtain w,4; from 
w,; , reorder in ascending size the quantities 


4 


(wa + Ri — gins)’, (wi2 — gis), (wis — Gini), *** » (Wie — Gis) 


The resulting sequence is wi41. We have wa S we S --: S w;, for all 7. As 
usual, a” = (a + |a|)/2. The times ¢; + w,; (1 S 7 S 8) are easily seen to be 
the earliest times after (or at) ¢; at which the s machines have finished serving 
those of the first s — 1 arrivals which they serve. 

Let F,(F?) be the d.f. (distribution function) of w;(wa). It was shown in [1] 


that F(x) lim,;../;(x) exists and satisfies a certain integral equation (I.E.); 


Received Dec. 21, 1954. 
1 Research under contract with the Office of Naval Research. 


2 The definition of v on p. 14 of [1] should be modified trivially to read »y = 1 in the case 
b = wo. 


147 





148 J. KIEFER AND J. WOLFOWITZ 


F*(z) = lim;..F'(z) also exists. Assume p = ER; / sEg; exists. F and F* are 
d.f.’s if p < 1, and F is then the unique df. solution to the I.E. Except in the 
trivial case where P{R; = sg;} = 1, if p = 1 then F = 0 = F*, and the LE. 
has no d.f. solution. Always F*(z) = F(z, ©,--+, ). Results on the limiting 
length of the line are also proved in [1]. 

Let F(x | y) be the df. of w; , given that w; = y;/.e., 


Fi(x\|y) = Plw; Sz\|m = y}. 
It was proved in [1] that, for all y e S, 


lim F(z | y) = F(a). 


+o 


Throughout this paper we shall assume that p < 1. The case p 2 1 has little 
interest and was essentially disposed of in [1]; results proved in the present 
paper are trivial when p = 1. Throughout this paper we shall assume that 
Eg, < «. However, it can be shown, always easily and sometimes trivially, that 
all the results of [1] and all the queueing results of the present paper except 
Theorem 3 are valid also when Eg, = ~. In order to eliminate the completely 
trivial we also assume, as was done in [1], that ER, > 0, Eg, > 0. Since p < 1 
we have then 0 < ER; < ~,0 < Eq, < @. 

In two or three places below we shall cite the first paragraph of Section 3 of 
[1]. To ease the reader’s task we now quote this paragraph in full: 

Let ¢;(a, b, c), 7 = 1,---, 8 be the value of wii41),; when w; = a, R; 

b, 9i41 = c. If dis a point in s-space, we shall say that a S d if every coordinate 
of a is not greater than the corresponding coordinate of d. If now a S d, then 
obviously 


¢;(a, b, c) > ¢;(d, b, c) 


for 1 $7 S s. Applying this argument k times we obtain the following result: 
Let Rissa = bigs, 9i45 = Cin5,3 = 1,°°:, kh. Let wise = €;: when w; = a1, 
and let wi... = e2 when w; = a2.. Then a; S a, implies e; S e. 

The results of [1] also imply that F(x) determines a stationary and metrically 
transitive flow; this is the process {ws} defined in Section 1, below, where the 
relevant references to [1] are given. 


1. Convergence of the mean waiting time. Let k be any positive number. 
Define W,, = int Wni - Since w,; is a nonnegative chance variable and F,,(z) 
F(x), we easily have that 
lim inf (Ew,,)* = / (x;)* dF (x), 


n 


(1.1) 


lim inf E(W,)* = |G 4 «5 +2) OW), 


where, of course, the right members may be infinite. From the fact (proved in 





THE GENERAL QUEUEING PROCESS 


[1]) that F(z) approaches F(x) from above for every x, we have that 


E(w,,)" < / (x,)* dF(z). 


lim E(w.’ = / (x,)* dF (2). 


Let Fx (z|y) be the df. of W,, given that w. = y(eS). Hence F% (z| 0) is 
the d.f. of W,, . Then 


Fryi(z|0) — FX(z|0) = [ wre | y) — Fx (z| 0)] dF2(y). 


It follows from the first paragraph of Section 3 of [1] that, if y e S, the integrand 
in the last integral is never positive for any z. Hence the left member in the last 
equation is never positive for any z. Hence F% (z | 0) approaches its limit (which 
is a distribution function obtainable from F(z) in an obvious way) from above. 
Consequently, as before, 


- \k 
E(W,)* < | (> x ) dF (zx). 


t=1 


From this and (1.1) we obtain 


. k 
lim E(W,)* = [(z x; dF(x) = m, (say). 


t=1 


mm ° ’ . . ° ° , . 
The question as to when m < © will be discussed in a later section. We define 


m™m = [ @* dF(z), 


: l< k 
J a em 2. (wa) . 
TL i=l 
We now prove 
THEOREM 1. We have, for any positive k, 


(1.3) P{lim Vax = m} = 1. 

Proor. Let w; be an s-dimensional chance variable with the d.f. F(x), and 
let w+: be obtained from w’, by using R, and g,4, in exactly the same manner 
as one obtains w,4: from w,. Thus w.» pertains at time ¢,. Then the process 
{ws, n = 1, 2,--- } is easily seen to be stationary, because F(x) satisfies the 
integral equation derived in [1] (see Section 3 of [1] for details). It is proved 
in Section 8 of [1] that F(x) is the only d.f. which satisfies the integral equation. 





150 J. KIEFER AND J. WOLFOWITZ 


We shall show that this implies easily that there cannot be a Borel set B in 
s-dimensional Euclidean space such that 


0< | dF <1, 
~“B 

and w; ¢ B implies with probability one that w, € B,n = 2. For let B be the 
complement of B, and F(x |B) and F(x | B) be, respectively, the conditional 
distribution functions on B and B implied by F(x). Then F(z | B) satisfies 
the integral equation. On a set of wi of probability one according to F(z | B), 
w>, ¢ B for n = 2 with probability one, since otherwise P{w? ¢ B} (when F is 
the distribution function of w;) would not be independent of n, contradicting 
the stationarity of {w',}. Hence F(x | B) must also satisfy the integral equation. 
Clearly, F(x|B) and F(x B) are not identical, in contradiction to the fact 
that F(x) is the only d.f. that satisfies the integral equation. From the fact 
that there is no invariant set B such that 0 < fs, dF < 1, the fact that w' 
is a Markoff process, and Theorem 1.1, page 460 of [6] (which asserts that any 
set in the space of the Markoffian chance variables w} , w: , --- that is invariant 
under a shift transformation differs from a set B by a set of probability zero), 
we conclude that the process w, is metrically transitive. Hence, by the ergodic 
theorem, 
(1.4) P{lim Vix = m} = 1, 

n-»00 


where 


Ven = ~ > (w,)*, 


i=1 


0 .- 0 
and of course w;; is the first component of the vector w; . 


From the argument in the first paragraph of Section 3 of [1], it follows that 
always 


(1.5) Var S Vin. 
Hence 


(1.6) P{lim sup Vix S m,!} 


n~x2 


We shall prove that also 


(1.7) P{lim inf Vix = m,} 


This will prove the theorem. 

We shall now deduce (1.7) from (1.4), and for this purpose divide the argu- 
ment into consideration of the four cases of Section 8 of [1]. As there defined, 
denote by [a, 5] and [c, d] the smallest closed intervals for which 


Plask Sb} = Pilesqm sd} =1. 


Of course, b or d or both may be +0. 





THE GENERAL QUEUEING PROCESS 151 


Case 1:b > sc. Let ¢ be so large that the point T = (t,t, --- , t) of S is such 
that 


| dF (x) > 0. 
z2<T 


It follows from (1.4) that there exists in S a point z < T such that 


(1.8) P{lim V4, = m| wi = 2} = 1. 


nweo 


It is proved in [1] that there exists an integer r such that P{w,y,, > T} > 0, 
say =a. From this it follows that 


(1.9) P{w, > T for at least one n} = 1. 


Let h be the smallest index n for which w, > T;h < © with probability one. 
Obviously Ry, Rasa , «++ and gasi , Qaae, -** are distributed independently of h 
and w, , and have the same distribution as R, , R., --- and gz,g3, -*+. Conse- 
quently, if we define, for n > h, 


Veli) = ET ST 


we have, using (1.8) and the argument in the first paragraph of Section 3 of 
[1], that 


(1.10) P {lim inf Vix(h) > m} = 1. 


Obviously from the definition of V,4(h) it follows that 
(1.11) P{lim (Vix(h) — Vaz) = 0} = 1. 


The desired result (1.7) follows from (1.10) and (1.11). 

Case 2:a < d. It is proved in Section 8 of [1] that, in this case, 
(1.12) P{w, = 0 forsomen 2 1} = 1. 
The desired result (1.3) follows from (1.4) and (1.12) by means of an argument 
like that in Case 1. 

CasE 3:c = d S a = b < sc. It is proved in [1] that in this case there is a 
point in S, there called @, such that 
(1.13) P{w, = Wani = *** = BD for somen 2 1} = 1. 


The desired result (1.3) follows at once. 
Case 4:d S a,b S sc, and either a < bore < d. It is proved in [1] that, in 
this case, there exists an e > 0 such that the set 


“= {ylyeS,y sD}, 
where 


e € € € 
w= (0, Us-1, Usa, °°° ’ ui) 





152 J. KIEFER AND J. WOLFOWITZ 


and 
uj; = max (0, b — je — «), 


has the following properties: 
(a) P{w, eT forsomen = 1} = 1. 


(This implies at once that 


(1.14) / dF(2) > 0) 


(b) P{w, > @} > 0. 


(This implies, using the argument in the first paragraph of Section 3 of [1], 
that 


(1.15) P{w, > @ for at least onen > 1} = 1.) 


The desired result now follows exactly as in Case 1, the place of 7 being taken 
by @. 

In exactly the same manner as that employed in this section we could have 
proved that 


( n \ 
(1.16) P< lim - > (Wy =m) =1 
(no 1 i=l 


and similar theorems about other moments. 


2. Generalization,of the lemma of Section 4 of [1]. We shall prove the fol- 
lowing essential generalization of the fundamental lemma of Section 4 of [1] 
both for its use as a tool in a subsequent section and for its intrinsic interest: 

Lemma. If, for any positive k > 0, 


(2.1) ER\" < @, 
then 
(2.2) sup E(w, — Wu) < @; 
or, what is equivalent, 
s—l k 

(2.3) sup E ((s — l)wrn. — Z was) < e, 

n j=l 

Proor. Define Y; exactly as in (4.5) of [1], i.e., 

Y; = max([(s — 1)R;, (8 — LI Riu — Ri, (8 — DRi-w — Riu — R,---, 


(a - 1), ~~ --+ — Rh 
Then (4.6) of [1] is 


Liy’,n)= P{Y, Sy} = Pik, Shy’, Rk. SHR + y’),---, 


R, ShRi +--+ Rit y’')}, 
where h = (s — 1). Let H(z) be the df. of R,. 





THE GENERAL QUEUEING PROCESS 153 


Define L(y’, 0) = 1. Obviously L(y’, n) is nonincreasing in n and, for n = 0, 
L(y’, n) — Liy’,n+1) = P{Y, Sy’, Ran > (Ri t+ --- +R + y')} 
(2.5) S P{Ray > h(i +--+ +R, + y’)} 
< E{l — HAIR, + --- +R, + y’'))}.- 


Hence 
1— Lly’,n) = > [(ly,i-)D —Liy, 0) 


i=] 


(2.6) " 
< > E{1 — HAR, +--+ + Re + y')}. 
t=0 


Let d be a small positive number and define 


D; = d when R; = 


D; = 0 otherwise. 


We choose d so small that d < 1 and 
Pp = P{D, = d} > 0. 


(We have earlier excluded the trivial case where R; = 0 with probability one.) 
Since R; = D,/h, if we replace the former by the latter in the right member of 
(2.6) we do not diminish any term of this member. It is well known (e.g., [2], 
p. 101) from approximations to the binomial distribution that, for suitable 
positive c; , C2, we have 


(2.7) Pi\D+-:-+D,.s ws < ae" 


When k 2 1 we have, from (2.6), 


KY,)* Sk DOG + WPL, > 5 
j=0 


0 0 


Sk DY G+ YE — AIR, + ++» Ri + MD} 


j=0 i=0 


<k> > G+ I R{1 — A(D. + --- D+ 39)} 


j=0 im 


sk dG +E — AD, + ++: + Dad} 


j= i—0 


< DO (j + 2*E{1 — H(D, + --» + Dy}. 


j=0 





154 J. KIEFER AND J. WOLFOWITZ 


We have now, applying (2.7) to the right member of (2.8), 


29) EY) sad (tore + DG +2"(1— 1 (2)). 
j=0 j=0 - 
The first series on the right of (2.9) obviously converges. Now consider the 
second. We have 
0 : | 70 ( j od) 
ds (7 + 7 (1 —H (2 )) _ Zz (j + 2)*P< Ry > te} 


») 


< j=0 


a 


(2.10) 


> \k+1 
(2) E(R, + 2)". 
pd 


In [1] (relation (4.5)) it is shown that 


((s — l)w,n — wes) S roa 
j=l 


Hence (2.3) and the lemma follow for k = 1. The proof for0 < k < 1 is almost 
the same; only a few obvious changes are needed in (2.8), (2.9), and (2.10). 


3. Finiteness of m;. Of great interest is the question of when m, is finite. 
In this section we shall give a sufficient condition for m;, to be finite (and hence 
a fortiori for m, < m, to be finite). We shall later see that this condition is 
essentially necessary for m;, to be finite. 

TuHeoreM 2. If k > 0, and 


(3.1) ER\" < , 
then 

(3.2) 

and 

(3.3) 


Proor. We assume that there exists a number 7’ > 0 such that g, < T with 
probability one. When we bear in mind how w,4, is related to w,, it follows 
immediately that, if Theorem 2 holds in this case, it a fortiori holds in general. 

In order to carry out the proof we shall assume that m, = © and obtain a 
contradiction. Let A be the set {x | 2, < 7}. Then from (2.2) we obtain that 


(3.4) sup | (a,) dF (x) < ~, 
n 4 


and hence 


(3.5) sup | (a, + --- 2,)‘ dF, (x) < @. 


4 





THE GENERAL QUEUEING PROCESS 


From the manner in which we obtain w,4; from w, we have that 
(3.6) Wau = Wat Ra — 89n41 

= T, and always we have 

Wri SW, + R,. 

We now note the inequality (2.15.1) on page 39 of [7], which states that r > 1, 
x 2 0, y 2 O imply that 
(3.8) a’ —y’ S rx” '(4 — y). 
Putting r = k+ 1,2 = Wau, y = W,, we have, from (3.6), 


Wet, — We Ss (kh + 1)(Wa + Ra — 89041)"(Rn — 89041) 


(3.9 ( k 
_ = (k+ 1)W* { (1 + Ren wn) (R, — adh. 
\ n / 

Consider the expression in brackets in the last expression of (3.9). By (3.1), 
the boundedness of g,4:, and the independence of W, from g,4; and R, , the 
conditional expected value of this bracketed expression, given W, , tends to 
E(Rn — 89na1) < 0 as W, —@ ~. Hence, if EW‘, + ~(=m) asn— ~, (3.9) 
implies that 


(3.10) lim E{Wts — We | wen = T} = —. 


n+>e@ 


Similarly, putting z = W,+ R,,y = W,, and noting that (a + b)' < 2*(a* 4 
b") if a, b, k = 0, (3.7) yields 


wei - WwW. s (W.4t+ RY — Wa Ss (kh + 1)(18. + Ra) Re 


< (k+1)2°(Wi + ROR, . 


From (3.1), (3.5) and the independence of R, and W,, , we conclude that there 
is a number c < © such that 


(3.11) 


(3.12) sup E{Wat, — Wat | wa < T} <ce. 


From (3.10), (3.12), and the fact that (3.5) and m, = © imply that A has 
probability >e > 0 according to F, for all sufficiently large n, we conclude 
that there is an integer No such that EWw**), <s EW** for n = No. Since, for 
n < No, EW" s E(Ri + --- Ry,)** < &, we conclude that sup, EW’ < 


. . . / rm: . 
©, contradicting the assumption that m = ©. This completes the proof. 


4. Necessity of the condition (3.1). The present section is devoted to the 
proof of 
THEOREM 3. If, for any positive k, 


(4.1) ERS* = « 
and Eg, < ~, then 
(4.2) 





156 J. KIEFER AND J. WOLFOWITZ 


It will easily be seen from our proof that Theorem 3 is a fortiori true if p = 1. 
Only the case p < 1 requires proof and this is the case we shall consider. 
Proor. Let m be so large that 


[ ar@) =a> 0 


where M is the set of all points (x, 22, --- , z,) in S such that z, S m. We 
have already remarked in Section 1 that the process {w>} there defined is sta- 
tionary and metrically transitive. Let »}, v2,--- be the indices n for which 
w, € M, and define 


— 0 
Mi = Winn —™ Ve 


It follows from the ergodic theorem that 
Eu’ 


Let {w,} be the process obtained from {w,} as follows: w; = w, = 0. There- 
after w, = w, until the first index n, say v; , such that wy, € M; define wy, = (), 
We now obtain each successive w, 4: from its predecessor w, by using R, and 
9n+1 in exactly the same manner as w,,; is obtained from w,, until the next 
index, say v2, for which w,3 would be in M; instead set w,, = 0. Continue in this 
manner to define {wr}. Define u; = via. — vj. Then wi ,u2,--- are independent, 
identically distributed chance variables. It follows from the construction of the 
process {w,} and the first paragraph of Section 3 of [1] that Eu; < Eu! . Hence 


Eu; is finite. It follows from the strong law of large numbers that 


( /; 


(4.3) P4 lim = Eui> = 1 


\ nw n ) 


We shall later show that 


(4.4) lim n p* (w;.)* = ©) = ], 


\ he i=l 


Since w, < w, it follows at once that 


(4.5) P4 lim n* > (w,,)? = o} = 1. 
\ n+>e i=l } 
Hence 


(4.6) P< lim n* >> (W)* = @) = 1. 


\ n+ i=l 


The desired result (4.2) follows from (1.16) and (4.6). 
It remains to prove (4.4). Let j(n) be defined for all integral n by 


t , 
Vin) SN < Vimy - 





THE GENERAL QUEUEING PROCESS 

We shall later prove that 

(4.7) E{(wi.)* + (wa) + +++ + (wiie)"} = &. 
From this and the strong law of large numbers it follows that 


( vi(n) 
(4.8) P 4 lim (j(n))™ > (wi.)* wo} 
i=] 


(n+ ) 


From (4.3) and (4.8) we obtain that 


( pi(n) 
(4.9) P< lim (v5) Ss (wi.)* = =| 


\ no i=] 


From (4.9) we have at once that 


(4.10) Py lim (vjin)) D> (wi,)* = «| = 1, 
i=l 


n~sa 


Also 


(4.11) Pites-«b <1 


no Vi(n) ) 


From (4.10) and (4.11) we have the desired result (4.4). 
It remains to prove (4.7). Let N be an integer so large that 


\ t=1 


(4.12) P?> gi < 2nEg, for alln = n} > Fos 


The existence of such an N follows from the strong law of large numbers. We 
may also assume N so large that 2NEg, > m. Let T = 4NEg, . Suppose that 
t => T and the largest integer contained in (t/42g9;) is t’. Then t/ = N, and 
(4.12) implies that the conditional probability of the event A: , 


(4.13) A; = {u; > t and wy, > 2/’Eg: for2 <n S #’}, 


given that w,, = #, is greater than r. (ui > ¢’ is implied by the other events 
in (4.13).) When the event A; occurs, we have 


Hy t’ 
(4.14) > (w.)* = > (w'.,)* > t'(2t'Eg,)* = et*** 
n=) n=l 


with c > 0. From (4.1) and the construction of the process {w,} we have (by 
considering ((Ri — g2)*)*** on the set where g2 < c where c < © is chosen so 
that P{g2 < c} > 0) that 


(4.15) E(w;,)""! = o. 





158 J. KIEFER AND J. WOLFOWITZ 


The desired result (4.7) follows from (4.14) and (4.15). This completes the 
proof of Theorem 3. 


The following theorem can be proved in essentially the same manner as 
Theorem 3: 


THEOREM 4. If, for a positive integer N, an integer j (1 S j S 8), and a posi- 
tive k 


(4.16) E(wy;)"" = @, 
then 


(4.17) [ cs" dF(xr) = ~. 


Theorem 3 is a special case of Theorem 4 for the case N = 2,7 = s. For then 
(4.1) implies (4.16), and (4.17) implies (4.2). Let M; denote the ith smallest of 
R,,---, R,, and suppose 


(4.18) E(M,)*" = @, 


Then (4.16) holds with N = s, 7 = 1. This also implies Theorem 3, for (4.1) 
implies (4.18) for i = s. Finally we remark that (4.18) with 7 = 1 implies 


(4.19) My, = O, 


5. Implications for the one-dimensional random walk. The results of the 
preceding sections imply not only results on the behavior of queues in general, 
but also results on the random walk in s-dimensional space. We shall content 
ourselves with pointing out two of these implications for the one-dimensional 
random walk, although the results for the s-dimensional walk obtained in 
earlier sections are more general and usually more difficult to prove. Without 
further remark all problems treated in this section are to be assumed to be 
one-dimensional. 


THEOREM 5. Let um, Ue2,++- be independent, identically distributed chance 


variables. Let S, = >> ui , and define 


i=] 

v = sup(0, S:, S:, Ss, °°: 
If 
(5.1) < Eu, < 0, 
and, fork > 0, 
(5.2) 


then 


(5.3) 





THE GENERAL QUEUEING PROCESS 


THEOREM 6. With the definitions of Theorem 5, if 
(5.4) —-o < Eu, < 0, 
and, fork > 0, 


E(uf)** = 


+ k 
Ev 


Proor. Consider the process: wt = ul, Wa41 = (We + Ungi)*, nm = 1. Let 
F*.(z) be the d.f. of w’ , and let 


F*(z) = lim F%(z) 

when the latter exists. It was shown in [3] and follows from the results of [1] 
for the case s = 1 that, when u, = R, — gnii, F*(z) exists, is a distribution 
function, and equals the limiting d.f. F(z) of w, . It was also shown in [3] that 
the distribution function of v is then F*(z). An examination of the proofs of 
these statements shows that they are valid for the process {ws} even when 
u, is not of the form R, — gnsi, provided only that (5.1) is satisfied. An ex- 
amination of the proofs of Section 1 and Theorem 3 of the present paper shows 
that they too hold even if u,, is not of the form R, — gn: . But then Theorem 6 
is simply a restatement of Theorem 3. 

It is sufficient to prove Theorem 5 for chance variables {us}, where u% = 
max(u,, —7') and T > 0 is so large that Eu% < 0. But us = (un + T) — T 
and is therefore of the form R, — ga4i, With R, = (us + 7), gna. = T. Theorem 
5 is then simply a restatement of Theorem 2. 

While the results of the present paper on the queueing process and the cor- 
responding s-dimensional random walk are new, Theorems 5 and 6 on the 
one-dimensional random walk were also obtained by Darling, Erdés, and Kaku- 
tani, to whom the problem was communicated by us. These writers also ob- 
tained other related results, and they have informed us that many of these 
results are implicit in [4]. In the course of the present work we have had inter- 
esting discussions with Professor Shizuo Kakutani. 


6. The mean queue length. As in [1], Section 9, let Q; be the number of indi- 
viduals in the queue waiting to be served, just before the service of the ith 
individual begins. To avoid trivial circumlocutions we assume G(0) = 0 (G(z) 
is the d.f. of g;). In [1] the limit D(z) of D, (zx), the d.f. of Q, , is shown to exist 
and D(z) is explicitly given. We shall now be concerned with 


Q, =n" > Q;. 


t=—1 


Let {ws} be the process defined in Section 1. We now construct a process 
fw’, , Q'.}, where Q! , Q2, --+ remain to be defined. Let t, = Dini gi . We define 





160 J. KIEFER AND J. WOLFOWITZ 


Q°. to be equal to the number of indices i which satisfy 
(6.1) tn < ti Stn + Wn. 


It follows that the process {w , Q°.} is stationary and metrically transitive, so 
that, by the ergodic theorem, Q°, = n™' >>? Q? approaches a constant limit c, 


c= [ zed, 


. oge ° : . 7 0 

with probability one. (It is easy to prove that c is contained between Ewy:/Eqi — 
7 0 ’ \ os 3 > ) 

1 and Ewni/Fg; .) Since wy: S Wri it follows from (6.1) that Q, < Q, . Hence 


(6.2) P{lim sup Q, Sc} = 1. 


Just as in Section 1, one proves that 


(6.3) P{lim inf Q, = c} = 1. 


Hence 


(6.4) P{lim Q, = c} = 1. 


The duration of busy periods. A busy period is a closed time interval, say 
t’ t Ss t”, such that all s servers are occupied throughout this interval, ?” — 
t’ > 0, and the interval is maximal, i.e., if 7’ St’ << it” S 7", 7” — 7’ >t” — 7, 
then all s servers are not occupied for some time point in the interval (r’, 7”). 
The length of the busy period is /” — ?’, ¢’ is its beginning, and ?” is its end. 
Let B; be the sum of the lengths of all busy periods at or before /; ; if ¢; is in the 
interior of a busy period, we count into B; the length of the interval from the 
beginning of the period until ¢; . 

It is easy to verify that whether or not any time point ¢ with t; < t < bis 
is in a busy interval depends only on w;, R;, and g;4: . Since the value of B,, 
is unaffected by removing from busy periods any of the points ¢; (1 S i S n) 
contained in them, it follows that the process 


(Bn, Wnj,n = 1,2 


is Markoffian. 

Let {w%} be the process defined in Section 1. Define B} = 0. Define B°,, n = 2, 
to be the same function of the process {w} as B, is of the process {w,}. Since 
w, = w, With probability one, it follows that B‘, = B, with probability one. 


ae ( ° . . —_— ° 
Since the process {w,,} is stationary and metrically transitive, so is the process 
0 0 4 
{Basu — Ba}, n = 1, 2,--- 


Hence 


P< lim = 





THE GENERAL QUEUEING PROCESS 


In essentially the same manner as in Section 1 one proves easily that 


Pilim 22 = E(B. 
n ) 


From this we obtain immediately that 


. B. BBD) 
P< lim — = —— > == 
. Eq. | 


This gives the long-term average time spent in busy periods. 
The limiting distribution of the length of a busy period can be obtained in 


a very tedious but straightforward manner from the marginal distributions of 
0 
the process {w,}. 


REFERENCES 

{1] J. Krerer, anp J. Wo.rowitz, ‘‘On the theory of queues with many servers,”’ T'rans. 
Amer. Math. Soc., Vol. 78, 1, January 1955, pp. 1-18. 

[2] J. V. Uspensky, “Introduction to mathematical probability,’’ McGraw-Hill Book 
Company, Inc., New York, 1937 

[3] D. V. Linney, ‘‘The theory of queues with a single server,’”’ Proc. Cambridge Philos. 
Soc., Vol. 48 (1952), Part 2, pp. 277-89. 

[4] P. Erpés, ‘On a theorem of Hsu and Robbins,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 
286-291. 

{5] J. Wotrow1Tz, ‘‘The efficiency of sequential estimates etc.,’? Ann. Math. Stat., Vol. 18 
(1947), pp. 215-230. 

[6] J. L. Doos, ‘‘Stochastic processes,’ John Wiley & Sons, Inc., New York, 1953. 

[7] G. H. Harpy, J. E. Lirrtewoop, anp G. Potya, “Inequalities,’’ Cambridge Uni- 
versity Press, Cambridge, 1934. 





TOLERANCE REGIONS 


By D. A. S. Fraser AND IRWIN GUTTMAN 
University of Toronto 


1. Summary. In this paper definitions are given for three types of tolerance 
regions. For distribution-free tolerance regions, an analytic condition is derived 
for the characteristic function of the region. Examples of the application of the 
condition are considered. For 8-expectation tolerance regions, a criterion for a 
good tolerance region is introduced, and it is shown that the problem of finding 
such a tolerance region can be reduced to that of finding a good test for an 
equivalent hypothesis-testing problem. Best tolerance regions are obtained for 
a number of single variate and multivariate problems involving normal distribu- 
tions. 


2. Introduction. Let X (@) be a measurable space and {P% |\@eQ} bea class 
of probability measures defined over X(@). For the theory in this paper we 
assume that an experiment corresponds to a sample of n from a component 
experiment. Hence our sample space is W = &”, and the probability measures 
are the nth power product of the measures {Px | 6 ¢ 2}. We designate these 
measures by {P%, | 0 ¢Q}. 

A statistical tolerance region is a mapping from the sample space W to the 
space of subsets @ of the component space. 

DEFINITION 2.1. A statistical tolerance region, S(z,,--- , Z,), is a statistic 
defined over W = %X” and taking values in the o-algebra @. 

In application the statistician calculates from the outcome (m1, ---, Z,) a 
region S(2,--+ , 2,) in the space X which is being sampled, and then makes 
some probability or expectation statement about the probability measure of 
this set. 

We first consider distribution-free tolerance regions. Heretofore the term 
“nonparametric” has generally been applied to these regions, but in accordance 
with the use of the term “distribution free’ in other branches of statistics, and 
because these regions can also be considered for parametric problems, we prefer 
the term distribution free. 

DEFINITION 2.2. S(a,-+-, 2%,) is a distribution-free tolerance region for 
{P% | @ © Q} if the induced probability distribution of 


P(S(ai, -++ , tn) 


corresponding to the measure Py over %” is independent of the parameter 
0€Q. 

Because the probability measure or coverage of a distribution-free tolerance 
region has a “known” distribution independent of the ‘‘unknown” parameter, 


Received December 16, 1954. 





TOLERANCE REGIONS 163 


the statistician is able to make a probability statement about the coverage of 
the region. 

The next definition is proposed more with a view toward the immediate 
requirements of a statistician. 


DEFINITION 2.3. S(x,, +--+ , Z,) is a B-content tolerance region at confidence 
level C if 


Pre} *(S(X1, sees X,)) = B} = C 


for all 6 € Q. 

For such a region, the statistician has confidence C that the probability 
content of the region S(z , --- , ,) is at least 6, regardless of the measure being 
sampled. Of course in some situations he may prefer that S(z , --- , 2,) satisfy 
the relation 


Pro} Bi < P(S(X, a X,)) 


for all @ € Q. 

The next type of tolerance region has had perhaps less attention from the 
applied statistician than it deserves. 

DEFINITION 2.4. S(x, --+ , Z,) is a B-expectation tolerance region if 


Ew{P%(S(Xi, ---, Xa))} $8 for all @ € Q. 


For such a region the average probability content of the region is at most 8. 

In hypothesis testing the reduction to similarity is sometimes helpful for 
finding a whole class of tests in convenient form. For tolerance regions, we 
therefore propose the following definition: 

DEFINITION 2.5. S(x,, «++ , %,) is a similar 8-expectation tolerance region if 


Et, { P&(S(X1, +++, Xa))} = 8 


for all @ € Q. 

A similar 8-expectation region can also be viewed as a 8-confidence region 
for a future observation from the distribution being sampled. For by noting 
that P& (S(a, +--+, 2,)) is the probability that another observation falls in S 
given 7, -*- , %,, we see that the left-hand side of the expression in Definition 
2.5 is the marginal probability of such an event. This probability is equal to 
8; hence there is 8 confidence that the future observation falls in S. 


3. Distribution-free tolerance regions. For distribution-free tolerance regions 
we are able to give a necessary and sufficient analytic condition. To do this we 
need the definition of a characteristic function, g,(1,---, 2,), of a region 
S(a1,°** , Zn): 


= | if y e S(ay,--- 


= ( if y g S(a,--- 


(3.1) 





164 D. A. S. FRASER AND IRWIN GUTTMAN 


where y ¢ X. Then it is easily seen that 
6/¢ 8 
Px(S(2 pg O8 , Zn)) = Ey\gr(a, i aie » Zn)} 


where the expectation applies to the random variable Y with probability 
measure P. 

THEOREM 3.1. A necessary and sufficient condition that S(x,---, 2,) bea 
distribution-free tolerance region is that there exist a sequence of real numbers 
a, a, --- such that 


:, , \ ‘ 
Pu,(%1 , oo¢ > Zn) — MH, Py,(%1 , ea » En) Py, (Xi , < » Zn) _ =e." 


',""* ++ for the power product 


6 aa ° 
measures of {Px | 6 € Q}. The sequence a, a2, -:- 18 the moment sequence for the 
° ° ° . 8 ' , , 6 
distribution of \ Px(S(X,, --: , Xn)), where the X; have measure Px. 
Proor. A distribution-free tolerance region has the distribution function, say 


are respectively unbiased estimates of zero over "* 


F,(v), independent of 6. Now, since a distribution function on a bounded interval 
is uniquely determined by the corresponding moment sequence and conversely 
(see [1]), it is equivalent to state that the moment sequence for Fev) is inde- 
pendent of @. 

Letting a, be the rth moment of F'4(v), then 


ol 


Qt, = v’ dF ,(v) 


~0 


= i [P&(S(a. , +++, 2n))]’ I] 4P x(a, 
‘ i=] 


a A. (EY fpy(xy 9 ** 5 Balt i II dP*(2x;) 
‘ i=1 


= Gy(t1, °** 5 In) APS | . IP<(x 

f fe Py\X1 B.) ¢ Xx y) I]: x\Z;) 

= = TI Oe, Agi, *** 5 fa) Il dP¥(y;) II dP*(zx;). 
‘ i=1 j=1 i=] 


Therefore, I[j-1 ¢y;(%1, *** ,%,) — a@, is an unbiased estimate of zero over 9""”. 
Thus, the statement that Fs(v) is independent of @ is equivalent to the existence 


r 


of the sequence a; , a2, --- such that the above expression is an unbiased esti- 
mate of zero for all r. 

For some theoretical developments it is convenient to have a definition of a 
randomized tolerance region. Let Z be a random variable whose probability 
measure is a measurable function of x71, --- ,2,. 

DEFINITION 3.1. S(xa1,--- , 2, ; 2) isa randomized distribution-free tolerance 
region for ‘P*!@ © Q} if the induced probability distribution of P-(S(a,, +++. 
Zn; 2)), corresponding to the measure P’, over %" and the random variable Z 
for z, is independent of the parameter 6. It is assumed that S(a,--- , 2, ; 2) 
is a measurable function of (7, ,--- , 2, , 2). 





TOLERANCE REGIONS 


As for the nonrandomized case, we define a function 


(m1, °+* ,%n32) = 1 if y e S(t, +++ , Ln 3 2) 


(3.2) 
ee ai, °° 2028). 


Taking the expectation with respect to Z, we define a related function 


(3.3) Pyy---we\%1, *** > 2a) = Ez4 |] $,;(%1,°°* ,2n3Z)> 
Lj=I / 


a function which is characteristic of the tolerance region. Then we have the ex- 
tension of Theorem 3.1: 

THEOREM 3.2. A necessary and sufficient condition that S(a.,--+ ,%n 32) bea 
distribution-free randomized tolerance region is that there exist a sequence of real 
numbers ay , a2, +++ such that gy,(a1, -** Ln) — G1, Pyyye(%1, *** Xn) — a2, 
are respectively unbiased estimates of zero over -"*", 9¢"**, -- + for the power product 
measures of ‘ps 6 ¢Q}. The sequence a, a2, --+ is the moment sequence for the 
distribution of V = P>(S(X,, >>: , Xn ;Z)) where the X; have the measure . 

Proor. This follows the method of proof given for Theorem 3.1. 

We give now some examples of the application of the above theorem for the 
nonrandomized case. 

EXAMPLE 3.1. Consider sampling from an arbitrary discrete distribution on 
the real line. We have W = R", and the class of probability measures 
is {P< | @ © @}, where @ here indexes the discrete distributions on R’. We estab- 
lish that there do not exist distribution-free tolerance regions S(x,, --- , Xn) 
symmetric in the z’s, other than the trivial tolerance region S = § or &. 

Let S(z,, --+ ,z,) be a distribution-free tolerance region which is symmetric 
in the x’s. We show that either S(z,, --- ,2,) = £ or S(m,---,2,) = X”. 

If v, (a; , «++, 2) is the characteristic function of S(a,, ---, 2), then by 
Theorem 3.1 we have the existence of a; , a2, «++ such that 


r 


(3.4) II ¢,(x, -+* in) — Oy 
is an unbiased estimate of zero over ""” 
For samples from %” we define a statistic called the order statistic 


fies, ***% Ba) = ty *** , Bel. 


This statistic gives the values of the z’s in the outcome (x, --- , %,), but not 
the order in which they occur in this outcome. Now it is easily shown that for 
the class of power product measures over %" considered here, this statistic is 
sufficient. Halmos [2] has shown that f(z , --- , 2,) is complete for the measures 
above. 

We have that (3.4) is an unbiased estimate of zero: 


(3.5) Ks I] OXn4j(X1 i. a eas — ay? = 
7=1 j 





166 D. A. S. FRASER AND IRWIN GUTTMAN 


Since ¢(a , «++ , 2n4r) is a sufficient statistic, the expression 


BE H $x,,;(41, 7 ae X.) ae t(X) _ 
\j=1 } 


is independent of @, that is, it is a statistic. But (3.5) can be written as 


\\ 


mr ae eee ) — a, | (xX) = T\' = 0, 


j=1 i) 


where the first expectation operator applies to the induced distribution of 
7 + 
t(x1, ++ , Xn4r). From completeness over 9(""", we have 


(3.6) E<T] ¢x,,;(X1, +++ , Xn) — a | (X) = 
\j=1 


/ 


almost everywhere with respect to the induced measures of f(a, -+- , 2nir)- 
Since the class { P& | 6 € Q} is the class of all discrete distributions, almost every- 
where means everywhere. 

We consider (3.6) with r = 1. The conditional distribution given the statistic 
t(a1, *** , 2n41) gives equal probability to all permutations of (x, --- , 2,41). 
Hence (3.6) with r = 1 becomes 


(3.7) Les, nas (xi, , hak « » XJ — 0 


(n = I)! 


everywhere; P designates summation with respect to all permutations 7, --- , 
ingt Of (1, --- ,m + 1). Since S(x,, --- , Z,) is symmetric in the z’s, so also is 
¢y(%1, *** , Zn) Symmetric in x, --- , 2, . Therefore (3.7) becomes 
Peng (21 eo » Tn) + Gz,(X1 oso » Zant » Zn+1) ee Pz, (Le ee ae Zn+1) 
(n + l)ay, 


and (3.8) holds for all x, , «++ , 2n41. Taking 2 = t2 = -++ = Yasui = Z, we have 


(n + l)gz (2, --- , 2) = (n+ I). 


(3.8) 


The quantity ¢. (x, --- , x) can be either zero or one. Hence (n + l)a: = 0 
or (n + 1); that is, ay = 0 or = 1. Thus the first moment of a random variable 
restricted to the interval 0, 1 is either zero or one. Obviously, the random variable 
(the coverage of the tolerance region) takes the value zero or one with probabil- 
ity one. Because the class of measures is the class of discrete distributions, this 
means either that 


S(t, +++, t%) =f 
or that 


S(x1, 19° Sa) = A 





TOLERANCE REGIONS 167 


EXAMPLE (3.2) Consider sampling from an arbitrary absolutely continuous 
distribution on the real line. We have % = R and we let @ in {Px | 6 € Q} in- 
dex the absolutely continuous distributions. We find the form of distribution- 
free upper tolerance limits in special cases. Suppose a distribution-free tolerance 
region S(m, --- , 2,) has the form 


(3.9) S(a, fe » Zn) = J “5 u(r, heb » Za)]. 


(Intervals are open at the end where a reversed bracket appears, and closed 
otherwise.) Then u(x;, ---, 2,) is called a distribution-free upper tolerance 
limit. We assume that u(x, , --- , z,) is symmetric in 2), --+ , 2. 

For convenience we define L, to be the Lebesgue measure over R’. As in Ex- 
ample 3.1, it can easily be shown that the order statistic is sufficient for the class 
of absolutely continuous distributions. It has been proved complete by Lehmann 
[3] and Fraser [4]. Following the argument in Example 3.1, we obtain from 
Theorem 3.1 with r = 1 that 


1 
ET  PetagslBins 27 9 a) = a 


almost everywhere (Lebesgue) over R”*. Since y,(x1, -*+ , Za) is symmetric in 
the z’s, we have 


(3.10) Poni (1 as iy » Zn) + =e + $2, (22, sa te » Tn41) = (n + 1)ay 


almost everywhere. Because ¢ is a characteristic function, (n + 1)a; is one of 
the integers 0,1, --- ,n,n + 1. We find the form of u(x , --- , Z,) when (n + 1)ay 
is 0, 1, n, orn + 1. 

Consider the case (n + 1)a,; = 1. We shall prove that 


Gy(%1, °**, Xn) = 1 ify S rq 


= 0 if yY > Za) 


atl 


almost everywhere in R”™’. In terms of u(x, +++ , Z,), this means that 


u(t, +++, %,) = 2a 


almost everywhere in R”. Let xq , 2) , «+ , Zim) designate the numbers 2; , --- 
x, arranged in order of increasing magnitude: za) S --- S 2). 

Suppose g, (%1, --* , Zn) = 1 on a set of positive Lebesgue measure in the 
region of R"*’ for which y > xq . From the properties of measure, it follows that 
there exists a positive 6 such that g,(z, , --- , Z,) = 1 on a set of positive meas- 
ure in the region y > xq) + 4; call this set A. Divide the space R” of (2, «++ , n) 
into “cubes” having sides of length e, a typical set being 


’ 


(3.11) {(a1, -++, an) | me <a, S (m+i1)e; ( =1,---,n)}. 


We let, of course, m; for each 7 range over all real integers. There is a countable 
number of such sets. Consider the following set B: 


B= {(a1, --+, an) | Tnfy | (ra, ‘++, 2n,y) € A} > 0}. 





168 D. A. S. FRASER AND IRWIN GUTTMAN 


From the properties of measure, there exists at least one of the above-defined 
cubes which intersects B on a set of positive measure. For a later purpose we 
require « < 6. 

Now by choosing e sufficiently small we can ensure the existence of at least 
one cube which intersects B on a set of positive measure and at the same time is 
disjoint from each of the diagonal sets, 


(3.12) {(a1, *** , tn) | te = 23}, 


which have measure zero. Designate one of these cubes by C. We summarize the 


-, an) € Bn C, we have ¢,(27,, --- , z,) = 1 at least 
for y belonging to a set of positive measure in y > 2q) + 6. 


results so far. For (2; , 


From the definition of g,(z; , --- , 2,) we know it is monotone nonincreasing 
in y and takes the values 0, 1. That is, if g, (a1, ---, %,) = 1, then 
GlZi', °** La) 1 for y < y*. Now from the last statement in the above para- 
graph, it follows that if (m1, ---,2,) e Bn C, then ¢,(m1, --- , ry) 1 for 
y SX + 6. 

We now derive a contradiction to (3.10). Without loss of generality, let 2 be 
the smallest of the co-ordinates for points in C (it will always be the same co- 
ordinate because C does not intersect the diagonal sets (3.12.)). Consider the 
set D in R"™’: 


D = {(ar, +++, tugs) | (i, ++ 4 te) ECO B 5 (tnyr, 22, +++, Tp) € Cn B}. 


From the first condition defining D, we have ¢z,,,(%, +++ , n) 1 for nui S 
x, + 6. From the second condition defining D, we have ¢.,(%n41 , 22, *** , Zn) 
1 for x; S 2na1 + 6. But for (a , +++ , 2n41) ¢€ D we have 2 and z,., both as 
possible first co-ordinates for a point in C; hence |z,; — 2,4:| < e. Therefore 
if (a1, °-- , asi) € D, Ge,,,(%1, °° * » En) l = ¢2,(%n41, U2, °** » Xn), Since the 
two conditions above are fulfilled by reason of our choice of «€ < 6. 
For (11, «++ , 2n41) € D we have the left-hand side of (3.10) equal to at least 2, 
while the right-hand side of (3.10) by assumption was 1. This is a contradiction 
if we show D has positive measure. 

Let L,, designate Lebesgue measure in R” and let ¥(2 , --- , z,) be the char- 
acteristic function of Bn C in R”. 


Las+i(D) = / 


J Rnrl 


n+1 
V(x, ‘oe. rn), ahs The ° °° g Bul I] dLy(x;) 
iil 


= / / W(2, gy OF Oo 2) dL,(2,) [ W(tou1, eee ~ Zal dl, 
R"—-1 JR YR 


X (2n41) II dL,(x;) 


| . M° (xe, +++ , 2a) II dL,(x;). 


JR” 





TOLERANCE REGIONS 


But 


L(BaC) = | War, +++ , tn) T] dla(x,) 
R® t=1 


(3.14) 


= / M(x2, +++ , tn) [] dLi(z,). 
Rn—1 1=2 


Since L, (Bn C) > 0 by construction of Bn C, then by comparing (3.13) and 
(3.14) we obtain L,4:(D) > 0. This is the contradiction we worked toward. 
Therefore our assumption that ¢g,(1, --- ,2,) = 1 on a set of positive measure 
in the region of R"™ for which y > za) was false. 

We have that ¢,(a, --- , 2,) = 1 on a set of positive measure only if that 
set is in y S rq). Thus for z;, < --- < 2;,,, there is only one term of (3.10) 
which can be 1 on a set of positive measure. However the right-hand side of 
(3.10) by assumption was 1 almost everywhere over z;, < --- < 2;,,,. There- 
fore ¢2;,(@i, , *** , Zi,,,) = 1 almost everywhere when z;, < --- < 2;,,,. That 
is, 


tn + 


Gy(T1, °**, tn) = 1 ify S rw 
= 0 ify > ra, 


and the distribution-free upper tolerance bound u(x, --- , 2,) equals rq). 

Similarly if (n + l)a; = n, then u(x, +--+ , Zn) = 2m, and by an almost 
trivial argument u(27,, ---, 2,) = —*”, + according as (n + Il)a, = 0, 
n + 1. 


4. 8-Content tolerance regions. Any tolerance region satisfying Definition 

2.1 will produce a 8-content tolerance region for suitably chosen C; for example, 

’ : 8 Vw , « 

C = inf Pro} PX(S(Xi, --- , X,)) 2 B}. 

OeQ 

Also, a distribution-free tolerance region will produce a #-content tolerance 
region with a property of similarity. For if S(a, --- , z,) satisfies Definition 
2.2, then, letting C equal the expression 


{pi (ov r\\ s ’ 
Pro} Px(S(X1, --- , Xn)) = B}, 
which does not depend on 6, we have a similar 8-content tolerance region given 
by 
6 y , , 
Prof PX(S(X1, ---, Xn)) 2 8} 
5. 8-expectation tolerance regions. First we prove some general properties of 
B-expectation tolerance regions. In Section 3 we defined by Formula 3.1 the 
characteristic function g,(11, --+, %,) of a nonrandomized tolerance region, 


and by Formula 3.3 with r = 1 we defined a characteristic function 
¢y(t1, *** , Zn) Of a randomized tolerance region. As a converse, we have the 





170 D. A. 8. FRASER AND IRWIN GUTTMAN 


THeorREM 5.1. Jf (a1, +++, 2,) ts a@ measurable function with 0 S 
Gy(t1, **-, Xn) S 1, then there exists a tolerance region S(x,, -++ , Zn) having 
¢y(%1, -** , Ln) as tts characteristic function. 

Proor. Let Z be a random variable which has the uniform distribution on 
(0, 1] and define a randomized tolerance region by 


S'(z, ae » Zn 3 8) on {y | ey(a , o+* Zn) = z}. 


Now we calculate the characteristic function of S’(z;, --- , 2, ; z) and using 
(3.2) obtain 


¢' (11, 7 » Ma) = Ez{[®)(x1 , sates » tn; Z)} 
= Prz{yy(ti, «++, 2n) = Z} 


- Gy (21 > ade , Xn). 
This proves the theorem. 
We also state a theorem on similar 8-expectation tolerance regions. 
THEOREM 5.2. A necessary and sufficient condition that S(x,,---,2%, 352) bea 
similar B-expectation tolerance region is that ¢,(11, +--+ ,2n) — 8B be an unbiased 
. @ an 
estimate of zero for the power product measure of Px over %"*". 
Proor. Let S(z,, --- ,2,) be a tolerance region; then the expected content 
is 
6 6 sw , 
(5.1) Ewz{ Px(S8(X1, ---,X.;2Z))}. 
From the definition of ¢,(21 , --- , Z,) this becomes 
~ 6 r 
(5.2) Ew Y {py(X, <2 2 X adi. 


Obviously, then, a necessary and sufficient condition that (5.1) be equal to 8 
is that g,(z1, --- , Zn) — 8 be an unbiased estimate of zero. 

To introduce the notion of a good tolerance region, we need a function which 
gives us for each @ in © the relative merits of sets S in @. Let the “desirability” 
of a set S when the probability measure is P& be given by a probability measure 
Qe(S) defined for all S ¢ @. Then for a tolerance region S we define the power 
to be 


(5.3) Ey {Qo(S(X1, «+, Xa))}; 


it is the average value of the “desirability” of the set S and in general is a func- 
tion of 6. In terms of the characteristic function of S(x,, --- ,2,), the B-ex- 
pectation condition is 


(5.4) [eo v(t, °*+ , tn) dPX(y) T] dPX(x,) < 8, 
‘ i=1 


and the power is 


few Gy(X1, ee » Sad dQe(y) II d *(z;). 


i=] 





TOLERANCE REGIONS 171 


The problem of finding a good tolerance region is then to find a characteristic func- 
tion satisfying the size condition (5.4) and having good properties for the power 
(5.5). Obviously, this is equivalent to finding a good test function ¢,(x: , «++ , tn) 
for the hypothesis testing problem, over 9¢"** 


’ 


Hypothesis: (P&£,---,P%,P%), 6€9; 
(5.6) x x x 


Alternative: (Px, +a 5 Px, Q), 6 €Q; 


(PZ, ---, P&, Q), for example, designates the probability measure of 
(X,, ---,X,, Y) over ©"** where X,, --- , X,, Y are independent, each X; 
has probability measure P{ , and Y has probability measure Qp . 

For the hypothesis testing problem there may exist a uniformly most powerful 
test. In this case we would call the corresponding tolerance region most power- 
ful. Failing the existence of a most powerful test, we could look for one yielding 
a maximum value to the minimum power over the alternative. The correspond- 
ing tolerance region we would then call minimax. 


6. 8-expectation tolerance regions for normal distributions. 
6.1. Univariate normal. Consider sampling from the univariate normal distri- 
bution with density function 


’ 1 
(290°) exp |- (x — |, 


where the parameter space @ is given by u ¢ R’, o’ € JO, ~ [. If a tolerance region 
is desired which tends to cover the center of the distribution more than the tails, 
then a reasonable choice of the measure Q,.2(A) on the real line might be the 
normal probability measure with mean yu and variance ajo’ with 0 < a < 1. 
This measure obviously gives more measure to sets in the neighbourhood of 
p and less to sets far from uz. 

We now consider the analogous hypothesis testing problem. Let X;, --- , Xn, , 
Y be independent and let X; have a normal distribution with mean yu and vari- 
ance o and Y have a normal distribution with mean yu, variance ao’. The hy- 
pothesis testing problem is of the form 
ne Hypothesis: a = 1 (u, o°) € Q; 
_— Alternative: a = a (u, o°) € Q. 


If we define = n° > 2 and ss = (n — 1)” > (x; — #)’, then it is easily 
seen that this problem has the sufficient statistic (2, s? , y). 

We now apply the invariance method to the problem expressed in terms of 
the sufficient statistic. Consider the group G of transformations induced by the 
two groups 


\ 


+ a\ | 
(6.1.2) 8: lae ah, 
ep 





D. A. S. FRASER AND IRWIN GUTTMAN 


( = 
| = 


(6.3.1) G2 = 

\y’ = vy ) 
Obviously, G; is a normal subgroup of G. For the group of transformations G, , 
a maximal invariant function is ((y — 2), s2). The group G2 induces a group on 
this maximal invariant, and it has maximal invariant T = (y — Z)/s,. By a 
theorem of Hunt and Stein [6] T is maximal invariant for G. 

In accordance with the invariance principle we look for tests based on T. Since 
the variance of (y — 2) is (a’ + 1/n) o’, then under the hypothesis T has the 
distribution of (1 + 1/n)'” ¢, and under the alternative, the distribution of 
(ai + 1/n)"” t, where ¢ stands for a random variable with Student’s ¢-distribu- 
tion having (n — 1) degrees of freedom. In terms of T, the hypothesis and al- 
ternative are simple. To find the most powerful invariant test, we now apply the 
Neyman-Pearson fundamental lemma. Let co = (1 + 1/n)"”, «, = (at + 1/n)'” 
(clearly co > c:). Then, the most powerful test function ¢(T) is based on the proba- 
bility ratio 


cl(n — 1)x]? /n - 
[ )r) r( 


or equivalently on \T\~*. Hence the most powerful invariant test function is 


o(W)=1 if |Ti <a 
0 if |T| >a 


(6.1.4) 


and to give the test size 8, ag is (1 + 1/n)"” t_s/2 , where ¢, is the point exceeded 
with probability a using the Student /-distribution with (n — 1) degrees of 
freedom. 

Since the alternative in (6.1) is a set of the maximal invariant partition of the 
parameter space, the envelope power function is constant valued over the al- 
ternative. By the theorem of Hunt and Stein [5], there is, for any non-invariant 
test, an invariant test for which the minimum power over the maximal invariant 
partitions of the parameter space is no smaller. Hence our most powerful in- 
variant test maximizes the minimum power over the alternative among size 
a tests. Also, it is most stringent. 





TOLERANCE REGIONS 


From the definition of T and ¢(T) we have the test 


Py(t1,°** » La) = i ber < dg 


> dg. 


Thus, the minimax and most stringent tolerance region is 
S(a a. s In) sa [z — Ass: , Z +> ag8-]. 


Values of ag are given in Table 1. It is interesting to note that the tolerance re- 
gion does not depend on the value of a; , provided it is less than 1. Also, under 
the hypothesis, the test statistic has a fixed distribution; hence the test and there- 
fore the tolerance region are similar, and we have a 6-expectation similar toler- 
ance region. 

If we are interested in having our tolerance region cover the left tail of the 
distribution, we might choose Q,.2: to be the normal distribution with mean 
u — e and variance o (e > 0). An analysis similar to the above shows that a 
minimax and most stringent tolerance region is 


S(x, a » Zn) _ ] —0,%+ a}s.] 


where a, may be found from Table 1 by using ay = doy-1 . 

If o’ is known, our parameter space is given by u ¢ R’. Using the same Q func- 
tions with o° taking the given value, an analysis similar to that above shows that 
for ability to pick up the center of the distribution, the minimax and most 
stringent tolerance region is 


S(ai, +++, tq) = [2 — dpa, + do] 


where bs = (1 + 1/n)"” zq_g/2 and z, is the point exceeded with probability a 
using the normal distribution with mean 0 and variance 1. Values of bg are given 
in Table 2. Also, the minimax and most stringent tolerance region of size y 
which tends to pick up the left-hand tail of the distribution is 


S(t, +++, %,) =]—%,# + die] 


where values of b} may be found from Table 2 by using b> = bay. 

If « is known, the parameter space is given by o ¢]0, © [. Using the same 
Q functions as before with » taking the given value, a minimax and most stringent 
size 8 tolerance region for picking up the center of the distribution is 


* , , 
S(t, °°+,2n) = [w — ta_ps., w + ta_pyr8zel, 


where s, is here defined to be n™ > (x; — uw)’; t, is the point exceeded with 
probability a using Student’s ¢-distribution with n degrees of freedom. Also, 
the minimax and most stringent size 8 tolerance region which tends to pick up 





174 D. A. S. FRASER AND IRWIN GUTTMAN 


TABLE 1 


Tolerance factors ag for univariate normal distributions with unknown 
unknown variance; sample size n 


8 








-995 





155.9 
16.27 
-333 
132 
-156 
-615 
274 
-040 
-870 
741 
.639 
.558 
491 
-435 
387 
346 
311 
. 280 
252 
. 228 
- 206 
-186 
168 
152 
137 
123 
-lll 
-099 
-088 


ww ww w 
ww w 
NONwNwW WW WS Ww 


tS tS Ww Ww Ww to 
tw bo 


~ 


www wo 


w ow 
bt bt b 


wCwwwwwwwwwr hr FoI D © 
ww Ww WS Ww 
~] 


tw bo 


wo 
to 


-480 
-468 
-457 
-447 
-438 
-430 
-422 
-415 
-409 
.403 


tS b&w bt 
NS Ww Ww b& bo 


to 


NONNNNNNN NW WD C 
tN bw 


Nw Ww WwW 


tw tw bt 


3 
3 
3 
3 
3 
3 
3 
3 
3 
3 


nw bo 
tw bw 


to 


.078 
-007 
-938 
.872 
2.807 


-397 
357 
.318 
.279 
241 


wow Ww Ww 
WNNNN 
Ww Ww & bt 
st 0 80 be 


bh 





the left-hand tail is 
S(21, ae » Sa) _ ] — 2, pu ~ ti_p8e], 
where ?’ and s, are defined immediately above. 


6.2 Multivariate normal. Consider sampling from a multivariate normal dis- 
tribution for which the density function is 


K exp [—3(w — u)A(w — w)’. 


















TOLERANCE REGIONS 


TABLE 2 


Tolerance factors bg for univariate normal distributions with unknown mean, 
known variance; sample size n 














8 
n 
995 | 99 | 975 95 | 90 | 75 
> 4 3.438 3.155 2.745 2.401 | 2.015 1.409 
3 | 3.241 2.974 2.588 2.263 | 1.899 | 1.328 
{ 3.138 2.880 2.506 2.191 1.839 1.286 
5 3.075 | 2.822 2.455 2.147 | 1.802 1.260 
6 3.032 2.782 2.421 2.117 —} a 7 1.242 
7 3.001 2.754 2.396 2.095 7s} 1.230 
8 2.977 2.732 2.377 2.079 1.745 1.220 
9 2.959 2.715 2.363 2.066 1.734 1.213 
10 2.944 2.702 2.351 2.056 1.725 1.206 
11 2.932 2.690 2.341 2.047 1.718 1.201 
12 2.922 2.681 2.333 2.040 | 1.712 1.197 
13 2.913 2.673 2.326 2.034 | 1.707 1.194 
14 2.906 2.666 2.320 2.029 | 1.703 1.191 
15 2.899 2.660 2.315 2.024 1.699 1.188 
16 2.893 2.655 2.310 2.020 | 1.696 1.186 
17 2.888 2.650 2.306 2.017. | 1.693 1.184 
18 2.884 2.646 2.303 2.014 | 1.690 1.182 
19 2.880 2.643 2.300 2.011 1.688 1.180 
20 2.876 2.639 2.297 | 2.008 1.686 1.179 
21 2.873 2.636 2.294 2.006 1.684 1.177 
22 2.870 2.634 2.292 | 2.004 | 1.682 1.176 
23 2.867 2.631 2.290 ‘| 2.002 | 1.680 1.175 
24 2.865 2.629 2.288 2.000 1.679 1.174 
25 2.863 2.627 2.286 1.999 1.677 1.173 
26 | 2.860 2.625 2.284 1.997 1.676 1.172 
27 2.859 2.623 | 2.283 1.996 1.675 1.171 
28 2.857 2.621 2.281 1.995 1.674 1.171 
29 2.855 2.620 2.280 1.994 1.673 1.170 
30 2.853 2.618 2.278 1.992 1.672 1.169 
| | | 
31 | 2.852 | 2.617 | 2277 1.991 | 1.671 1.169 
41 2.841 2.607 | 2.269 | 1.984 1.665 1.164 
61 2.830 2.597 | 2.260 | 1.976 | 1.658 1.160 
121 2.819 2.586 | 2.251 1.968 1.652 | 1.155 
20 | 2.807 | 2.576 | 2.241 | 1.960 | 1.645 | 1.150 





Let the parameter space @ be given by u ¢ R* and A belonging to the space of 
k X k symmetric positive definite matrices. If a tolerance region is wanted which 
tends to cover the center of the distribution rather than the extremities, then 
for the parameters u, A a reasonable choice for the Q,,4(A) measure over R* is 
the normal distribution with mean » and covariance matrix ajA~ with 0 < 
a) < Ze 


We now formulate the hypothesis testing problem which corresponds to this 


176 D. A. S. FRASER AND IRWIN GUTTMAN 


problem in tolerance region construction. Let Wi, --- , W,, = be independent 
and let each W; have a normal distribution », A; and let = have a normal dis- 
tribution 4, a “A. Then the problem is to find a best size 8 test for the problem 


Hypothesis: a@ = (u, A) ¢Q, 
Alternative: a = am, (u, A) € Q. 


Defining @ = n° zo We and A = (n — 1)” bs (we — w)’ (Wa — w), we 
have as sufficient statistic for this problem, (@, A, &). 


TABLE 3 


Tolerance factors cg for bi-variate normal distributions with unknown means, 
unknown variance-covariance matrix; sample size n 


6 


-995 -99 975 


106, 667 26, 664 | 4,264 
746.2 | 371.5 146.2 
159.4 | 98. 51.34 
76.66 | 52. 31. 
50.23 | 36. 23.13 
38.18 | 28 . 6 
31.50 24.2: 

27.33 | 21. 
24.50 
22.47 18. 
20.94 16. 
49 
18.81 15. 
14.8: 
17.39 14.; 
13.5 
.39 $. 
13.3: 
5.64 13. 
5.34 12.8: 
12.6: 
.82 12. 
12. 
Al ‘3. 
.23 12. 
11.8: 
as 
11.67 
11.5: 
11.4! 


+ 


me oe on 


www Ww 


or or or Gr Gr Or Gr Gr Or Or Gr Or Or 


wWHoWwWwWwWwWWWwWWwWwWwWwWwWwWWwWwwwww w 


aon Fs444n4nawwawanrwnawooowoq@s 


nw w 


o> 


oO S 
~ -» - Cr or cr 
oo 


' 
nh bv 


9 








TOLERANCE REGIONS 






177 


We apply the invariance method. Consider the group G, of transformations 


on the sample space R*' 


/ 
) 
G; = 
cy — 
| Ss 


/ 


Wa = WaB + tla 
tfB+¢ 


n+l) 


l, --- “] 


B belongs to the class of nonsing- 


° k 
| ular k X matrices, andfeR 


These transformations leave the problem unchanged. The induced group on the 


10 
ll 
12 
13 
14 
15 


16 


19 


bh tS t& Ww t& 
ore © bo 


a 


63 
123 


995 


243, 169 
1,434 


276.9 


124.! 


v4 


46. 
39. 
34. 
31.% 
28. 
26.{ 


25 


m2 
23.: 
22.¢ 
21. 
21. 
20. 
20. 
19. 
19.% 
18. 
18. 
18.3: 
18. 


13. 


10 


41 


17 
26 


63 


17.85 


17. 
17.4 
iZ.i 
16. 
14. 
13.3 


12.8 


99 


60,787 


714. 


171.8 


85. 
56 .{ 
43 ./ 
35. 


.05 


w 
or 


2NHNNWHWN WH 


86 


a 
-40 


.62 


99 


ll 
-20 
44 
.80 


OF 


v0 


13 


Q”7 


-of 

.00 
16.6 
.o9 
14 
.90 
5.69 
.50 
32 
.16 
.O1 

.87 
.90 
2.99 
2.14 
ll. 


63 


34 


TABLE 4 


Tolerance factors cg for tri-variate normal distributions with unknown means, 
unknown variance-covariance matrix; sample size n 


12 


11 
11 
11 


10. 
9. 
9. 


-06 
32 


.68 


.99 
.16 
17 
71 
.59 
.70 
.99 
15. 
14. 
14. 
14. 
13. 
13. 
13. 
13. 
12. 
12. 
12. 
12. 
. 26 
12. 
12. 


40 
90 
48 
12 
81 
53 
29 
07 
88 
70 
54 
40 


14 
03 


.93 
83 
16 


53 
922 


348 


ll 
11 
1] 


10. 





ll 
-90 
14 
44 
63 
34 
.38 
63 
.03 
12. 
12. 
ll. 


54 
14 
80 
re 


.51 
.25 
.03 
10. 
10. 
10. 


84 
67 
52 


or 


as 


“I-31 -1 0 OO OOO WO ®< 


Ns J 


aonam 


10 


96 


oon 


Or Or St Sr OOD DAD NIN DW 


o 


~a ra hh Ph SS PS S LS OI 














178 D. A. S. FRASER AND IRWIN GUTTMAN 


space of the statistic (w, A, £) is 


i = wB + Z\ | ) 
B nonsingular, 
G.=<| @ =§B+2Z 


Ze R* 
| \4’ = BYAB } 


It is straightforward to show that a maximal invariant statistic is 


T = (§ — B)A™ (¢ — DY’. 


TABLE 5 
Tolerance factors cs for quadri-variate normal distributions with unknown means, 
unknown variance-covariance matrix; sample size n 








99 





107 , 992 
1,158 
262. 
125. 
81. 
60. 
48. 
41. 
36. 
33. 
30. 
28. 
27. 
25.8 
24. 
23 . 8: 
23. 
22. 
21. 
21.2: 


— 
oo - 


“I © 0 © © 


~I 





~s 


aaa 


20. 
19.¢ 
19. 

19. 


AaAAAIAD 


18.§ 
18. 
18. 
18. 


15.! 
14.3! 
13.2! 





























TOLERANCE REGIONS 179 


The problem as interpreted for the induced distributions of the statistic T has 
a simple hypothesis and a simple alternative. By applying the Neyman-Pearson 
fundamental lemma, a short analysis shows that the most powerful invariant 
test is 


AT)=1 #fT <ey 
= 0 if TY’ > cg. 


Under the hypothesis, fT’ (1 + 1/n)~ has the distribution of Hotelling’s T” 
with (n — 1) degrees of freedom. The probability density function of T° with 
(n — 1) degrees of freedom is 


r(z) 
5 27 _ 4\(k—2)/2 
(6.2.1) wien (T’/n 1) - d(T’ /n — 1). 


pfeie(hg set fem ue 


But if we make the transformation T’ = (n — 1) k/(n — k) F, (6.2.1) is easily 
seen to become the probability density function of Fisher’s F-distribution with 
k, n — k degrees of freedom. Hence, to give the test and consequently the toler- 
ance region size 8, we take 


cg = (1 + 1/n) (n — 1) (k/n — k) Frg, 


where F,, is the point exceeded with probability a using the F-distribution with 
k, n — k degrees of freedom. 

Now, by the same argument used for the univariate case, the minimax and 
most stringent size 8 tolerance region for the k-variate normal distribution is the 
ellipsoidal region given by 


{€ | (& — B) A (— — BY’ S og}. 


Values of cg for k = 2, 3, 4 are given in Tables 3, 4, and 5. 


, 


REFERENCES 


1] M. G. Kenpatu, The Advanced Theory of Statistics, Vol. 1, Charles Griffin & Company, 
Ltd., London, 1948. 

2] P. Hautmos, ‘‘The theory of unbiased estimation,’”’ Ann. Math. Stat., Vol. 17 (1946), 
p. 34-39. 

3] E. L. Lexnmann, ‘‘Notes on the theory of estimation,’’ University of California Press, 
1950. 

4] D. A.S. Fraser, ‘“‘Completeness of order statistics,’’ Canadian J. Math., Vol. 6 (1953), 
pp. 42-45. 

5] E. L. Leumann, ‘Some principles of the theory of hypothesis testing,’? Ann. Math. 
Stat., Vol. 21 (1950), pp. 1-26. 

6] E. L. Leumann, “Theory of testing hypothesis,’’ lecture notes published at the Uni 

versity of California, 1949. 





GENERALIZED TOLERANCE LIMITS 


By J. H. B. KemPERMAN 
Purdue University 

1. Summary. A method for constructing tolerance limits due to Fraser [8] is 
generalized by allowing that each step of the construction may depend not only 
on the blocks previously formed but also on all the known boundary observa- 
tions and, moreover, on certain sets of indices. Furthermore, Tukey’s [5] lexico- 
graphical ordering is replaced by a more general type of ordering. 

2. Introduction. Let {2, %, u(A)} be a measure space with u(Q) = l and 
complete. Then the relation P(X ¢ A) = u(A) (A e YW) defines a random variable 
X taking values in the space 2. Let W = (x, --+ , Z,) be a set of n independent 
observations on X and let D; = D;(W) (j = 1, 2, ---) be disjoint measurable 
subsets of 2 depending on W. These D; sets are called (nonparametric) tolerance 
limits when the joint distribution of the random “coverages” u(D;) does not 
depend on the true distribution u(A) of X, given that the latter belongs to a 
certain rather wide class of probability measures. Such tolerance limits were 
first introduced by 8. 8S. Wilks [1] whose method was generalized to a far extent 
by A. Wald, H. Scheffé, J. W. Tukey, R. Wormleighton, and D. A. 8S. Fraser 
({2}, [3], [4], [5], [6], [7], [8}). 


3. Ordering. By a (generalized) ordering o in 2 we shall mean an assignment 
of exactly one of the relations 7; < 22, 41 ~ 22, Or %; > 22 to each pair 2% , x2 


of points in 2, such that 2; ~ z2 is an equivalence relation and such that o in- 
duces an (ordinary) transitive ordering among the corresponding equivalence 
classes. Let Q = Au Bwith A < B in the obvious sense. We shall assume that 
always: (i) A is measurable. (ii) If A is non-empty, we have A = U,{x|z S< a} 
for some (at most denumerable) subsequence {a,} of A. Similarly, if B is non- 
empty, we have A = fix{x| xz < bh} for some subsequence {h} of B. 

One way of obtaining such a generalized ordering is as follows: Let M be a 
finite or denumerable well-ordered set and let, for each m in M, gm(x) be a real- 
valued measurable function on Q. If gm(z1) = gm(Z2) for all m in M, we define 
2X1 ~ 2X2. Otherwise, x; < x2 if and only if g,(x1) < g.(x2), where s is the smallest 
index such that g,(21) ¥ g,(x2). 

An ordering o is said to be continuous (with respect to the measure u(A)) 
when for each 2 in 2 we have piz|x ~ a} = 0. 

Lemma 1. Let o be a continuous ordering and let q(ao) = P(X < 2%) = ula\ar< 
xo}. Then q(X) is a uniformly distributed random variable in (0, 1). 

Proor. Let 0 S$ q $ 1, A = {x| q(x) S gq}, and B = {x| q(x) > q}. We 
have to show that 

P(q(X) S q) = P(X € A) = w(A) = @. 


Received September 11, 1954, revised January 29, 1955. 


180 





GENERALIZED TOLERANCE LIMITS 181 


But for a, in A, we have u{x|z S ax} = wla|x < ax} = g(ax) S q; hence, 
from (ii), «(A) < q whether or not A is empty. Moreover, for b, in B, we have 
uix |x < by} = q(be) > g; hence, from (ii), u4(A) = gq whether or not B is empty. 


4. Partitioning. Let m, mp, and m, be positive integers, m = mp + m,. Let 
21, °** , Lm be m — 1 points in a measurable subset D of Q and let o be a given 
ordering. Denoting by z* the m-th smallest (= m-th largest) point x; with 
respect to o, the partition of D into the three disjoint subsets Dp = {x |x < 2%, 
reD}, Di = {x\2 > x*,x2eD}, and D* = {x|x2 ~ z*, x € D} is called the 
(mo , m;)-partition of D with respect to o and to the m — 1 points z; in D. Note 
that, when z; ~ 2; does not happen for i ¥ j, the “boundary” element 2z* is 
unique, while Dy) , D; , and D* contain exactly mp) — 1, m, — 1, and 1 elements 
x; , respectively. If o is continuous, u(D*) = 0, hence, u(D) = u(Do) + pu(D,). 

Lemma 2. If u(D) > 0 we assume that o is continuous and that 1, --+ , Xm-1 
are m — 1 independent observations on X restricted to X ¢ D. Then w(Do) = 
qu(D), where q is a random variable which has the incomplete Beta-function 
I ,(mo , m) as its cumulative distribution function. 

Proor. We may assume that u(D) > 0. Let Y be the random variable whose 
distribution »(A) = u(A)u(D)"(A CD) is that of X restricted to X ¢ D. 
Observing that o induces an ordering on D which is continuous with respect to 
v(A), it follows from Lemma 1 (replacing 2 by D, and X by Y) that for g(a) = 
vinx |x < 2%,xeD} the variable q(Y) is uniformly distributed in (0, 1]. Hence, 
q(x*) is the m-th smallest among m — 1 = mp + m, — 1 independent observa- 
tions g(x;) on a uniformly distributed random variable in |0, 1]. This proves 
that qg(x*) = u(Do)u(D)~ has the d.f. I,(mo , m). 


5. The construction. For the sake of brevity, we shall employ a somewhat 
colloquial language. In the construction two persons are involved: a statistician 
(S) and his assistant (A). A knows precisely the actual outcomes of the n inde- 
pendent observations x; , --- , 2, on X, while, at the very outset, S has no in- 
formation at all about these outcomes. On the other hand, S has at his disposal a 
class H of orderings 0 in 2 known to be continuous with respect to the distribu- 
tion u(A) of X. 

In the first step of the construction, S selects an ordering0,; from H and a 
positive integer mo , m S n, and asks A to give him the mo-th smallest observa- 
tion z*(1) with respect to 0, (this element is unique with probability 1), together 
with the two sets of indices corresponding to the mp — 1 and m, — 1 = n — mo 
observations which are smaller or larger than x*(1), respectively. Now, S can 
draw the (mo, m)-partition 2 = Q% u Q* u Q, of 2 with respect to 0; and the 
set of the n observations zx, in 2. Let D°(0) = Q, D'(j) = 2; (j = 0, 1), and 
DT = o*. 

After k steps, 0 < k S n — 1,8 has obtained a partition of 2 into k + 1 
disjoint “blocks” D*(j) (j = 0, 1,---, k) and k boundary sets D? (i 1, 
-++ , k), each of u-measure 0. Further, for each of these 2k + 1 sets, S knows 
precisely the set of indices corresponding to the observations x; within the set. 





182 J. H. B. KEMPERMAN 


Finally, for each boundary set D? (i = 1, --- , k), S knows the actual value of 
the boundary observation x*(i) in D? (with a probability 1 these boundary ob- 
servations are unique). 

Now, the (k + 1)-th step of the construction proceeds as follows: His choice 
depending, in any way whatsoever,, on the knowledge acquired, S chooses: 
(i) A distinguished block D = D*(j*) among those of the k + 1 blocks D*(j) 
(j = 0, --- , k) which contain at least one observation. (ii) A positive integer 
mo not larger than the number m — 1 of observations in D. (iii) An ordering 
Ox4: from H. 

He then asks A for the m-th smallest observation x*(k + 1) in D with respect 
to 0x41, together with the two sets of indices corresponding to the m — 1 or 
m, — 1 = m — ™ — 1 observations in D which are smaller or larger than 
x*(k + 1), respectively. 

Using the acquired value x*(k + 1), S is now able to draw the (mp, m)- 
partition D = Dou D* u D, of D with respect to 0,4; and the m — 1 observations 
in D. Afterwards, he renumbers the blocks D*(0), --- , D*(j* — 1), Do, Di, 
D‘(j* + 1), ---, D*(k) as D‘*(j) (j = 0,--- , k + 1), in this order. Finally, 
let Dia = )*., 

After exactly n steps the construction stops. Then S has obtained a partition 
of 2 into n + 1 disjoint blocks U; = D"(j) (j = 0, --- , n) and, further, n bound- 
ary sets Df (k = 1, --- , n), each of u-measure 0. 

THEOREM 1. The coverages c; = w(U;) (j = 0, --- , n) have the joint distribu- 
tion de, dcz --- dc,, where c; 2 0, a +--+ +e, = 1 — o& S 1. More- 
over, the union U of m distinct sets U; has a coverage p = u(U) with df. I,(m, 
n+1-—m). 

Let 0 < a < 1, and let p = p»(a) be such that I,(m, n + 1 — m) = a. Then 
with a probability 1 — a, the random set U contains at least a proportion p,,(a) 
of the total probability mass 1 in Q (i.e., we have confidence limits on the dis- 
tribution of X or its parameters). For a = .01 or .05, the value p,,(a) may be 
determined by using F-tables. Let Fo be the a-point of the F-distribution 
with n, = 2(n + 1 — m) and nz = 2m degrees of freedom. Then p,(a) = 
(1 + Fon:/n)*. 

Some warning seems desirable. If the construction stops after k steps, we 
have a partition of Q into the blocks D*(j) (j = 0, --- , ) and, further, k bound- 
ary sets of measure 0. Let the random variable n; — 1 denote the number of ob- 
servations in D*(j) (j = 0,---, k), and let N; = m + --- + nj_1. One can 
easily see that D*(j) is the union of the “final” blocks Uy,, +++ , Un;4n;-1 (Which 
might be found by completing the construction) and, further, some set of meas- 
ure 0. However (as certain counterexamples sho'v), this does not imply that 
conditional to n; = m (m given) the coverage o: D*(j) has the conditional dis- 


1 Chance decisions are also allowed. For example, instead of making each decision as the 
necessity for it arises, S could start with a complete plan which provides for all contin- 
gencies. Then we may as well assume that § has already dete: mined beforehand the actual 
outcomes of the random decisions which might arise. 





GENERALIZED TOLERANCE LIMITS 183 


tribution J,(m, n + 1 — m). Generally, the latter conclusion is only justified 
when (with a probability 1) both N; and n; are constant. 


6. Proof of Theorem 1. In order to keep the proof on an elementary level, we 
shall avoid an explicit use of the usual complicated measure preserving trans- 
formations (cf. Fraser [8], pp. 53-54). Let t; = t(j) = s7\/—1 (j = 0,--- ,n) 


be complex parameters with s; real. It suffices to show that the characteristic 
function 


Elexp (to log co + --- + t, log c,)] = E(co° «++ cx") 


depends only on n and the ¢; but not on the distribution of X or the actual 
mode of construction. For then the joint distribution of log co, --- , log c, , and 
hence the joint distribution of co , --- , c, , will not be affected when X is replaced 
by a rea! random variable, uniformly distributed in [0, 1], and when the order- 
ing O. (k = 1,---, ) is replaced by the common ordering in [0, 1]. But then 
&, *** , C, become the differences between consecutive order statistics, and 
Theorem 1 now follows from a well-known result (cf. Wilks [1]). 

After k steps, 0 S k S n, the construction based on the sample W = 
(a1, +++ , 2,), yields (with a probability 1) a partition of 2 into the k + 1 blocks 
D*(j) G = 0, +--+, ) and, further, k boundary sets of measure 0. Let the ran- 
dom variable n; — 1 equal the number of observations inside D*(j) and let N; = 
mo + m + +--+ + nj. Now consider the quantity 


k r(nj) are err 


Pe pa Tin; + CN) + --- + UN, + 0; — 1)’ 





depending on the parameters fo, --- , ¢, . Though for 1 < k < n the joint dis- 
tribution of u(D*(0)), --- , w(D*(k)) depends strongly on 8’s method of con- 
struction, it turns out that, for any mode of construction and for each (arbitrary 
but fixed) set of values fy , --- 


stn, 


(1) E(m) =n! Tin+1+h+--- +t)” (k = 0,1,---,m)- 


For k = n, we have n; = 1, N; = j, u(D*(j)) = u(Us) = c; (j = 0, --- , n), and 
(1) implies 


Ele +++ ec) = ntTn+1+t+--++t) Ire, + 0, 
j=0 


where indeed the right-hand side depends only on n and the f; . 
Formula (1) is evident for k = 0. For, u(D°(0)) = u(Q) = Land m = n +1, 


i = 


Noy = 0 (when k = 0) imply that pp is always equal to the right-hand side of (1). 
Let k be a fixed integer, 0 S k S n — 1; it suffices to prove that E(px) = E(px41). 

Let D*(j) (j = 0,---, k), D = D‘(j*), Do, Dy, m, mo, and m, be as defined 
in the (k + 1)-th step of the construction. Here, with probability 1, Dp and D, 
contain precisely m» — 1 and m,; — 1 observations, respectively; (mo + m, = m). 
Moreover, u(D) = u(Do) + u(D,). It now follows from the definitions of px , 





184 J. H. B. KEMPERMAN 


presi, and the blocks D**"(j) (j = 0, --- , k + 1) that 


T'(mo) T(m) Tim + tt + t”) ov 
I'(m) (mo + t’) Pim + #t7) 


where u(Do) = qu(D) and, with N = N;, 


Pk+t = Pk (1 —- qg)’ , 


th = t(N) + --- +N + m — 1), YY = (N+ m) + --- +N +m-— 1). 


In view of the footnote to the construction, we may assume without loss of 
generality that S has a complete non-random plan of construction which pro- 
vides for all contingencies. The following information 2 has been received by 
S from A during the first k steps of the construction: (i) For i = 1, --- , k, the 
value £; and the index »; of the boundary observation z*(7). (ii) For 7 = 0, --- , 
k, the indices o(j, h) (hk = 1, --- , nj — 1) of the observations in the block D*(j). 
Here, the n different integers v; and o(j, h) together constitute the full set of 
indices (1, 2, --- , ). 

Knowing only =, S can reconstruct the blocks D*(j) (j = 0, --- , k) according 
to plan; hence, 2 is equivalent to the information known to § at the beginning 
of the (k + 1)-th step. Therefore, = completely determines the quantity px , 
the distinguished block D = D*(j*), together with the ordering 0,4; , and the 
positive integers mo and m, (mp + m, = m = nj-) mentioned in the (k + 1)-th 
step of the construction. 

To almost all samples W there corresponds a set of information 2 of the above 
type. Among these corresponding >’s, let Yo be a specific set of information (i) 
and (ii). Denoting the 7th observation by x(z), it is evident that in an actual 
construction Lp» will arise if and only if: (i) z(v;) = & (@¢ = 1,---, k). (ii) For 
j = 0,---, k, we have x(o(j, h)) € D*(j) (j = 1,---,m; — 1), where the set 
D*(j) is uniquely determined by 2». Hence, we have for 0 < 7 S k that, given 
= = Xo, the observations z(c(j, h)) (h = 1, ---,m; — 1) behave as n; — 1 
independent observations on the random variable X restricted to X ¢ D*(j) 
(provided u(D*(j)) > 0). 

Further, Do is obtained as the “lower” set in the (mp , m;)-partition of D = 
D*(j*) with respect to the nj; — 1 = m — 1 observations x(o(j*, h)) in D and 
the continuous ordering 0,4; . It follows from Lemma 2 that, given = = Xo, 
we have u(Do) = qu(D), where q has the conditional d.f. J,(mo , m). 

Moreover, given 2 = Xo, the quantities p; , m, mo, m, , t’, and ?” are constants. 
It now follows from (2) that 


E(pes:| 2 Zo) = pe = E(px |Z = 2X), 
implying that E(px41) = E(px). 
7. A remark. The above proof is not completely rigorous because the very last 
step (“implying that’’) is still open to doubt for lack of a precise definition of 


the expected values E(pxs1 | 2 = Yo), Epes), etc. The latter omission is also 
the root of the following difficulty. 





GENERALIZED TOLERANCE LIMITS 185 


’ 


If, in the construction, S’s decisions depend too wildly (that is, in a non- 
measurable way) on the available information, it may easily happen that the 
coverage c; of the final block U; is a non-measurable function (with respect to 
the Borel field 2” in 2") of the sample point W = (x, --- , z,). Then the ques- 
tion arises as to what (in the assertion of Theorem 1) is meant by the probability 
Pr(c; S a;) (j = 0,---, n). The following approach to this question, which 
avoids additional measurability assumptions, was indicated to me by Prof. D. 
A. 8. Fraser. 

For simplicity, let us assume that S starts with a complete non-random plan 
which provides for all contingencies. Let Q stand for a specific (a priori possible) 
outcome of the indices of the observations inside the blocks D*(j) (k = 1, «++ , n; 
j = 0,---,k) and the indices 7 of the boundary observations x*(k) (k = 1, 

-, n). Let the (finitely many) different possible outcomes Q be denoted by 
Q:,°-:,Q,. Let f.(W) = 1 (r = 1, --- , p) when the construction based on W 
yields the outcome Q, ; otherwise, f,(W) = 0. Thus, > f(W) = 1 for almost 
all W. 

Let F,, be the class of all the subsets B of 2” such that, for r = 1, --- , p, the 
integral 


- 


PB) = [| (-(W) xe(W) dulas,)] --> du(e.,)] 


JQn 


has a meaning and exists as a repeated Lebesgue-Stieltjes integral; here (7; , - -- 


, 


i,) corresponds to Q,, while x,(W) denotes the characteristic function of B. 
Let F, (0 S k < n) be the class of F,-sets B such that W, C B implies W2 C B 
whenever the two constructions based on W, and W; yield, at the end of the kth 
step, exactly the same information ~ (cf. the above proof). 

One can show that: (1) F; is a Borel field (k = 0,--- ,n)andFy CF, C--- Cc 
F,,. (ii) P(B) = >, P,(B) defines a probability measure on F,, . (iii) The fune- 
tion p.(W), employed in the above proof, is F;y-measurable (k = 0,---, n); 
hence, c; = c;(W) is F,-measurable. (iv) The above proof becomes exact by 
defining (at the (k + 1)-th step) E(y | = = Xo) as the conditional expectation 
of y relative to F, with {F, , P(B)} as the underlying measure space. (v) Con- 
sequently, interpreting the assertion of Theorem 1 in terms of this same measure 
space, we have a meaningful and true result. 


8. The discontinuous case. The above procedure imposes one restriction on 
the distribution u(A) of X; namely, that each ordering (which might be used in 
the construction) of the given class H is a continuous ordering with respect to 
u(A). In the so-called discontinuous case, the distribution u(A) of X is completely 
unrestricted. However, in this case the above construction might break down 
with a positive probability in the sense that some boundary set will contain 
more than one observation. This defect will be repaired as follows (cf. Fraser 
(8], p. 50). 

Let Y be a real random variable, uniformly distributed in the unit interval 





J. H. B. KEMPERMAN 


(0, 1], which is independent of X and let X’ = (X, Y), taking values in 
x L. To each ordering o in 2 we associate the following ordering o’ in 0’: 


2 


(a1, 41) < (a2, ye) if 7 < rv, 0rxm ~ mand y; < ye. 


Let H’ consist of all orderings in Q’ which are associated to some ordering in 
2. Then, even in the discontinuous case, each ordering o’ in H’ is continuous with 
respect to the distribution yu’(B) of X’. 

Let 71, --:, 2%, and y,,--- , Y, be independent observations on X and Y, 
respectively. Then x; = (2;, y:) (¢ = 1, --- , n) are n independent observations 
on X’. Replacing in the above construction Q, H, and x; by 2’, H’, and z; , re- 
spectively, we obtain a partition of 2’ into the final blocks Uj (j = 0, --- , n) 
and the set of measure 0 consisting of the n observations x; . Clearly, the cover- 
ages c; = u’(U;) satisfy the assertions of Theorem 1. Thus we are able to set 
precise tolerance limits on the distribution of X’ = (X, Y) which will yield some 
information on the distribution of X. 

As a simple illustration: Let o be any ordering in Q and let z’(1) S 2’(2) Ss 

-- S 2’(n) be the ordered set (with respect to 0’) of the n observations x; 
on X’. Then U = {z’| x’ < x’(m)} has a coverage p = u’(U) with a cumulative 
d.f. I,(m, n + 1 — m). But, for 2’(m) = (x(m), y(m)), 


u’(U) = p{x| xz < x(m)} + y(m)yu{[x| r~ x(m)} 
= pw{xz|x < x(m)} = P(X < x(m)) =c (say). 
Hence, 
P(e S p) = Plu'(U) S p) = 1,(m,n + 1 — m), 
a well-known result due to Scheffé and Tukey ([3], p. 191). 


REFERENCES 

[1] S. S. Wiiks, ‘‘Determination of sample sizes for setting tolerance limits,’’ Ann. Math. 
Stat., Vol. 12 (1941), pp. 91-96. 

[2] A. Wap, ‘‘An extension of Wilks’ method for setting tolerance limits,’? Ann. Math. 
Stat., Vol. 14 (1943), pp. 45-55. 

[3] H. Scuerr& anv J. W. Tuxey, ‘‘Non-parametric estimation: I. Validation of order 
statistics,’’ Ann. Math. Stat., Vol. 16 (1945), pp. 187-192. 

[4] J. W. Tuxey, ‘Nonparametric estimation: II. Statistically equivalent blocks and 
tolerance regions—the continuous case,’? Ann. Math. Stat., Vol. 18 (1947), pp. 
529-539. 

[5] J. W. Tuxey, ‘‘Nonparametric estimation: III. Statistically equivalent blocks and 
multivariate tolerance regions—the discontinuous case,’’ Ann. Math. Stat., 
Vol. 19 (1948), pp. 30-39. 

[6] D. A. S. Fraser anv R. WorMLeIGHTON, Nonparametric estimation: IV.,’’ Ann. Math. 
Stat., Vol. 22 (1951), pp. 294-298. 

[7] D. A. S. Fraser, “Sequentially determined statistically equivalent blocks,” Ann. 
Math. Stat., Vol. 22 (1951), pp. 372-381. 

[8] D. A. S. Fraser, ‘‘Nonparametric tolerance regions,’’ Ann. Math. Stat., Vol. 24 (1953), 
pp. 44-55. 





ON A CHARACTERIZATION OF THE STABLE LAW WITH FINITE 
EXPECTATION 


By R. G. Lana 
Indian Statistical Institute 


1. Introduction and summary. A remarkable characterization of the normal 
law is that if « and y are two independent chance variables such that two linear 
functions, ax + by (ab * 0) and cx + dy (cd # 0), are distributed independ- 
ently of each other, then both z and y are normally distributed. This theorem has 
been proved without any assumption about the existence of moments by Dar- 
mois [2], extending earlier results of Gnedenko [4] and Kac [5]. The question that 
naturally arises is how far the condition of stochastic independence is neces- 
sary, or, in other words, whether the above theorem can be generalised after re- 
laxing the condition of stochastic independence of the linear functions of two 
independent chance variables. But it is evident that we can always construct two 
linear functions of non-normal mutually independent chance variables such that 
they are not independent in the probability sense. In the present paper we shall 
investigate the nature of the distribution law that may be obtained by imposing 
the mild restriction of the linearity of regression of one linear function on the 
other, which is, of course, weaker than the assumption of stochastic independence. 
We shall prove a general theorem from which a number of results will follow as 
special cases. But it should be noted that the statements regarding regression or 
conditional expectation require the assumption that the conditional distribution 
function exist, and in the following, this assumption will be tacitly made wherever 
needed. 


2. Results. First of all, we shall give a short proof of the following lemma of 
Rao [8], Rothschild and Mourier [10]. 

Lemma. Let x and y be two proper random variables each having a finite expecta- 
tion (which we may assume to be zero without any loss of generality) such that the 
regression of y on x exists. Then the necessary and sufficient condition for the regres- 
sion of y on x to be linear is that 


du ’ 


dg(u, 2] 4 do(u, 0) 
v=0 


ov 


where ¢(u, v) stands for the characteristic function of the joint cumulative distribu- 
tion of x and y, and 8 is a constant. 


Proor or Necessity. Since ¢(u, v) represents the characteristic function of 


Received October 11, 1954; revised March 17, 1955. 


187 





188 R. G. LAHA 


the joint cumulative distribution of x and y, we have 


o(u, v) = E(e“*t*") 


/ get dF (x, 2 ) 


fe[f eY ary) | aPC, 


where F,(y) represents the conditional distribution function of y for fixed z. 
But since the expectations of both z and y are assumed to exist and to be 
equal to zero and, further, the regression of y on z is linear, we must have 


arts, a im? / co ll y ar. | dF (zx) 


is | es dF (x) 
dol, 0) 
du 


Proor or SuFrricrency. Since the regression of y on x is assumed to exist, 
let us denote it by E.(y), so that E.(y) = fydF.(y). 
Then proceeding as above, it can be very easily shown that 


dg(u, v) 


Ov 


|. = if &(E.(y)) aF). 


Hence, the condition 


avis, 2)| = elu, 0) 
Ov a du 


gives 
| ** Way) — Ba] aF(2) = 0. 


Then, from the uniqueness theorem of Fourier transforms of functions of bounded 
variation, it follows that Z.(y) = 8x, for all x, except for a set of probability 
measure zero. 

THEOREM 1. Let x, £, and n be three proper random variables, each having a finite 
expectation (which may be assumed to be zero without any loss of generality) such 
that x is distributed independently of the joint distribution of & and n, but — and n 
have a joint distribution where the regression of n on & exists and is linear and given 
by E:(n) = Bot. Then the regression of Y = cx + non X = ax + £, (a ¥ O), ts 
always linear irrespective of the distribution functions of x, §, and n, whenever the 
relationship c = apo is satisfied. 

Proor. Let 6(u, v), o(u, v), and ¢:(u) represent the characteristic functions 





CHARACTERIZATION OF THE STABLE LAW 189 


of the joint cumulative distribution of (X, Y), (&, 7), and the cumulative dis- 
tribution of x, respectively. Then, 


O(u, v) = neers 


(1) = Efeteet0tivtest 


= gi(au + cv)o(u, v). 


Now, differentiating both sides of (1) with respect to v and then putting v = 0, 
we get 


69(u, ») 


(2) . 
Ov 


| = epi(au)e(u, 0) + Gee) gi(au). 
v=0 Ov v= 


But, using the lemma above, since E;(n) = Boé, 
Apu | de(u, 0) 
3 eae = 2. 
(3) Ov v= Bo du 
Next, substituting (3) in (2), we get 


d®(u, v) 
Ov 


(4) | = cpi(au)e(u, 0) + Bop’(u, O)gr(au). 
Again, putting v = 0 in (1) and then differentiating both sides with respect to 
u, we get 


d®(u, 0) 


(5) du 


= agi(au)p(u, 0) + ¢’(u, O)¢i(au). 


Now, if c = af, substituting this value of c in (4) and then comparing with 
(5), we get easily 


(6) 


aP(u, 4 oe d&(u, 0) 
abel mie. 


Ov du 


Then from the lemma above, it follows that the regression of Y on X is always 
linear, whatever may be the distribution function of z, , and 7. From Theorem 
1, it follows that if » and & are stochastically independent, and further if c = 0, 
then the regression of Y on X is always linear, since in this case 8p = 0 and the 
relationship c = ap is satisfied. 

Similarly, if & = by (b ¥ 0) and » = dy (d ¥ 0), and further if be = ad, the 
regression of Y on X is always linear. 


3. Further results. 

THEOREM 2. With the same notations and assumptions as used in Theorem 1, 
the necessary and sufficient condition for the regression of Y on X to be linear for 
all a contained in a closed interval (a, , a2), where either a, < a2 <Oor0 <a < @, 
and for some c for which the relationship c ~ ao is satisfied for all a in the interval, 
is that both x and & should belong to a class of stable law with finite expectation. 





190 R. G. LAHA 


Proor or Necessity. Using the above lemma, the condition of the linearity 
of regression Y on X gives the relation, 


(7) oi | = gthu, 0) 
t v=() 


Next, using (4), (5), and (7) together, we get, after a little rearrangement of 
terms, 


(8) (c — a8)¢i(au)p(u, 0) = (8 — Bo)gr(au)y’(u, 0). 


We shall first show that in (8) neither c — a8 nor 8 — Bo can be equal to zero 
under the conditions of the theorem. 


Let us suppose that 8 — 8 = 0 when c — af = 0. In this case, (8) reduces to 
(9) gi(au)e(u, 0) = 0. 


Since g(u, 0) is continuous and equal to unity at the origin, u = 0, there always 
exists a neighbourhood, say % > 0, such that for all |u| < w , we have g(u, 0) + 
0. Then it follows from (9) that for |u| < wo/ao, we have 


(10) gi(u) = 0, 


where dp is the larger of |a,| and |a,|. This implies that the distribution of z itself 
is improper, the whole mass being concentrated at the origin z = 0. 
Similarly, if c — a8 = 0 when 8 — # ¥ 0, (8) reduces to 


(11) gi(au)y’(u, 0) = 0. 


From (11), proceeding exactly as above, it can be shown that the distribution 
of & itself is improper, the whole mass being concentrated at the origin, —§ = 0. 
But both these cases contradict the conditions of the theorem. Now the only 
alternative left is when both c — a8 and 8 — {> vanish simultaneously. But in 
this case we have c = af», which is again contrary to the conditions of the theo- 
rem. 

Now it may be noted that both ¢,(u) and ¢(u, 0) may have real roots. Let « 
and 6 denote the smallest of the absolute values of the real roots of ¢;(u) and 
¢(u, 0), respectively. Since both ¢;(u) and ¢g(u, 0) are continuous functions of u 
and since ¢:(0) = ¢(0, 0) = 1, it follows that « > 0 and 6 > 0. Then, restricting 
the values of a to an interval J, (a; , a2), for which |a] < ¢/5, we can always 
take the neighbourhood of the origin to be defined by |u| < 6. Thus we have 
proved the existence of a neighbourhood |u| < 6 of the origin and of an interval 
I, (a, , a), such that both ¢:(u) and g(u, 0) do not vanish if a e J and |u| < 6. 

Then, confining the values of u and a in these intervals, since the product 
¢i(au)y(u, 0) # 0, we may divide both sides of (8) by ¢:(au)e(u, 0) and thereby 
obtain 


(12) (c — 0p) 20%) - (9 — py) 20) 


1 
gi(au) g(u, 0) ° 





CHARACTERIZATION OF THE STABLE LAW 
Next, integrating (12) with respect to u, we get 


(13) In g:(au) = a(B — bo) In g(x, 0), 
c — ag 


where the constant of integration vanishes by virtue of the fact that In ¢,(0) = 
In ¢(0, 0) = 0. 

Since the first moment of zx exists, In ¢;(au) is differentiable with respect to a 
in the interval (a; , a2). Thus it follows that 6(a) = a(8 — Bo)/(c — aB) must 
also be differentiable with respect to a in the same interval; denoting this deriv- 
ative by 6’(a), we may write 


(14) u ei(au ) 


= (a) In ¢(u, 0). 
¢i(au) 


Again, from the conditions of the theorem, @(a) ~ 0 for all a in the interval 
(a; , de). Hence, using (12) and (14) together, we get 


r g’(u,0) _ RAC) 
(15) g(u, 0) 6(a) 


d In g(u, 0), 


In glu, 0) 


where \ = a6’(a)/@(a) for all a contained in the interval (a, , a2), and thus it 
follows evidently from (15) that \ is independent of a. 

Then, excluding the origin from the interval |u| < 6, that is, in the intervals 
(0 < u < + 4) and (—é < u < 0), we may divide both sides of (15) by 


u In g(u, 0), 


and obtain 


(16) 1 ¢’(u, 0) adi 


In g(u, 0) g(u,0) ou 
Hence, integrating (16) with respect to u, we get 


(17) In In g(u, 0) Jd log |u| + loga, for O<u< +8 
7 o(u, 0) = | 


Alog |u| + loge, for —i<u<0O. 
Now, (17) evidently leads to the relation 


( perlul® 


: for O0O<u < +6 
(18) g(u, 0) = 


ec2ll 


' for —5 <u <0, 


where c; and c; are the constants of integration. But it is well known that neces- 
sary conditions for a function ¢(¢) to be a characteristic function are: 


(i) 90) =1, Gi) |e()| S 1, and (iii) o(—2) = of). 





192 R. G. LAHA 


Hence, it evidently follows that c,; and cz in (18) should be complex conju- 
gates; that is, we may write 


¢q = —(A + 7B) and c; = —(A — 7B), 


where A = OQ. Thus the formula, 


g(u, 0) = exp | -(4 + iB 4 ut | 


holds for all u in the interval |u| < 6. 

It can be easily shown that 6 = + ©, since from the continuity of the charac- 
teristic function, we have ¢(+6, 0) ¥ 0, which contradicts the assumption that 
5 is the smallest of the absolute value of the real root of g(u, 0). Hence, the charac- 
teristic function of the distribution of & is given by 


(19) g(u, 0) = exp | -(4 + iB : ) ut |. 
u 


Now, it should be noted that the case A = 0 should be excluded, since when 
A = 0, |\g(u, 0)| = 1 for all uw; this leads to the trivial case that the whole mass 
of the distribution is concentrated at a single point. 

It is already pointed out in (15) that \ does not involve a, so that on solving 
\ = alé’(a)/0(a)| as a differential equation in a, we get 


20) (a) = K\a\". 


Hence, we have from (13) 
(21) ¢gi(au) = exp | -K (4 + iB - ) | au P|, 
L 


where K > 0 for the same reason as A > 0. 

Next, we shall show that 1 < A S 2. If AX S 1, the first derivatives of both 
gi(u) and ¢(u, 0) fail to exist at the origin, which means that the first moments of 
€ and zx do not exist, contrary to the assumption of our theorem. On the other 
hand, if \ > 2, the second derivatives of both the functions ¢,(u) and ¢(u, 0) 
exist and vanish at the origin. In this case, the second moments of both £ and x 
exist and are equal to zero, which means that the whole mass of the distribution 
is concentrated at the point § = z = 0, and we have ¢;(u) = ¢g(u, 0) = 1 for 
all u (c.f. Cramer [1]). Now, from Lévy and Khintchine [6], it follows evidently 
that the characteristic functions (19) and (21) uniquely determine the distribu- 
tion function of a stable law with finite expectation when and only when the 
parameters A, B, K, and X satisfy the restrictions 


(A > 0; K > 0; <A 8 2; 


B cos (72) < r). 


(22) 





CHARACTERIZATION OF THE STABLE LAW 193 


Proor or SuFrFicrency. We have to show that if the distribution functions of 
£ and x are characterized by (19) and (21), respectively, with the parameters 
satisfying the restrictions as listed in (22), and further that if the first moment of 
n exists and the regression of 7 on é exists and is given by E;(7) = Bot, then the 
regression of Y = cx + » on X = ax + € should also be linear, where z is in- 
dependent of the joint distribution of £ and ». 

Here we have 


(23) In ¢: (au) = K\a\* In ¢(u, 0). 


Then we have 


gi(au) _ 1. dingi(au) _ lr jap 2 ingle, 0) 


gi(au) a du a du : 


so that we have 


(24) ¢: (au) g(u, 0) = . K|a\* ¢i(au) ¢’(u, 0). 


Then, if &(u, v) stands for the characteristic function of the joint cumulative dis- 
tribution of X and Y, we have, on substituting the value of vi(au)g(u, 0) as in 
(24) in (4) and (5) above, 


ee] = (: K |a\* + ) gi(au)e’(u, 0), 
Ov ve0) a 


d&(u , 0) 
du 


(25) 
= (1+ K |a|)ei(au)y’(u, 0), 


so that 


C = » 
-— K |al 
(26) aetna) _ Pot GA lal gow, 0) 


Ov 1+K a> du 


Then the proof follows at once from (26), using the lemma. 

It is also interesting to note in this connection that if we further assume that 
either — or z has a finite variance, that is, that the second derivative of either 
(19) or (21) exists at the origin, then \ should be equal to 2, and hence both x 
and ~ should be normally distributed. 

Coro.uary 1. (The problem of Ragnar Frisch.) In the problem of Ragnar Frisch, 
which has been solved independently by Rao [8], [9] and by Fix [3], it has been as- 
sumed that x, §, and n are mutually independent chance variables. Thus, it may be 
treated as a special case of Theorem 2, above, by putting Bo = 0. 

Coro.uary 2. (Generalisation of Darmois’ Theorem.) If x and y are two inde- 
pendent chance variables with finite expectations such that the regression of Y = 
cx + dy (d # 0) on X = ax + by (b # O) exists and is linear for all a contained 
in a closed interval (a, , a2), where either a, < a2 < Oor0 < a; < a, and for some 





194 R. G. LAHA 


c for which the relationship be # ad is satisfied for all a in the interval, then both 
x and y should belong to the class of stable law with finite expectation. 

This may be treated as a special case of Theorem 2, above, by taking — = by 
and 7 = dy. Finally, we shall construct a simple counter-example to show that 
the theorem is not true when the regression of Y on X is linear for some fixed a 

For this purpose, let us take 


et —sin2Inz 


(27) In ¢(t) = | (cos |t| « — 1) —5~ de (1 <8 < 2), 
Jo ‘ 
We shall show that ¢(¢) in (27) represents the characteristic function of a sym- 
metric infinitely divisible law with a finite first moment which is assumed to be 
zero. First of all, we note easily that ¢(¢) in (27), being real, represents the charac- 
teristic function of a symmetric law. Now, following the notations given by 
Loéve [7], we define 
00 sin2Inz 
6 


=a dx if z > 0, 


=" 


—sin®In|z 
Te dx ifx < 0, 
where 1 < 6 < 2. 

It can be easily verified that G(x) satisfies all the conditions stated in Loéve’s 
representation formula for the infinitely divisible law. Hence, ¢(¢) in (27), 
above, is the characteristic function of a symmetric infinitely divisible law. 

Using the transformation |t\z = u, (27) reduces to 


In g(t) = it|° | (cos u — 1) 


exp {—sin’[In u — In {¢| }} 
ee ( 
(28) . - 


lu. 


.-ee <2. 


Now the first derivative of g(t) in (28) exists at the origin, so that the first moment 
exists and is equal to zero. 
Again, from (28), we have 
- exp {— 
In g(at) = |at|\’ | (cosu — ae . ; du, 
(29) ° yi 


1<6 <2. 
Then, using (28) and (29) together, we have at the point a = e””, where k takes 
any of the values +1, +2, +3.---, 
(30) In g(at) = |a\* In g(t). 


If the characteristic functions of the cumulative distributions of z and é are 
given by (27), above, then proceeding exactly in the same way as in (23), (24), 
(25), and (26), it can be shown that the regression of Y = cr + nonz = ax + & 





CHARACTERIZATION OF THE STABLE LAW 195 


is linear for some fixed a, where a = e"" and k is any one of the numbers +1, 
+2, +3,--:. 

In conclusion, the author expresses his thanks to the Referee for some helpful 
comments. 


REFERENCES 
. H. Cramir, Mathematical methods of statistics, Princeton University Press, 1946, p. 91. 
2. G. Darmois, “Sur une propriete caracteristique de la loi de probabilite de Laplace,’’ 
C. R. Acad, Sci. Paris, Vol. 232 (1951), pp. 1999-2000. 

. E. Frx, ‘Distributions which lead to linear regressions,’’ Proceedings of the First 
Berkeley Symposium on Mathematical Statistics and Probability, University of 
California Press, 1949, pp. 79-91. 

. B. V. GNEpENKO, “On a theorem of 8. N. Bernstein,’’ Izvestiya Akad. Nauk SSSR. Ser. 
Mat., Vol. 12 (1948), pp. 97-100. 

. M. Kac, “‘A characterization of the normal distribution,’’ Amer. J. Math., Vol. 61 (1939), 
pp. 726-728. 

. P. Lavy anp A. Kuintcuine, “Sur les loi stables,’’ C. R. Acad. Sci. Paris, Vol. 202 
(1936), pp. 374-376. 

. M. Love, ‘Fundamental! limit theorems of probability theory,’’ Ann. Math. Stat., 
Vol. 21 (1950), pp. 329. 

3. C. R. Rao, “Note on a problem of Ragnar Frisch,’’ Econometrica, Vol. 15 (1947), pp. 
245-249. 

. C. R. Rao, “A correction to note on a problem of Ragnar Frisch,’’ Econometrica, Vol. 
17 (1949), p. 212. 

. Cotetre RotTuscuHiLpD AND Epita Mourter, “Sur les lois de probabilité 4 regression 
lineaire et écart type lié constant,’’ Comptes Rendus 225 (1947), pp. 245-249. 





NOTES 


A VARIABLE PROBABILITY DISTRIBUTION FUNCTION’ 


By Ruric E. WHEELER 
Howard College, Birmingham 


1. Introduction and Summary. It is the purpose of this paper to develop an 
expression for the probability of x successes in n trials, P(n, x), where the prob- 
ability of success on a single trial depends both on the number of the trial and on 
the number of previous successes. This result should prove useful in obtaining 
various probability functions. It will be noted that this work includes the case 
considered by Woodbury [3]. 


2. Definitions. Letting p,,, be the probability of a success and q,,, the prob- 
ability of a failure on the rth trial after s successes, with p,,, + 9;,. = 1, we for- 
mulate the following definition. 

DEFINITION 1. We will use the symbol S; to be a function of p,,, and z, 
where x is the number of successes, with the following defined properties: 

(a) Ss = []tx0 pease] [225 peseegiss.c, Where i < x ([]{"° is defined to be 1). 

(b) The product of S;, S;, and S, in any order (or any number of factors) 
is defined to be 


i—1 j-l om 


1 
I Pit44,t7i4+1, i j4+2, Qk+3,k 5 
k 


II Pt+i,t IT Pt+2,t II Pt+3,t I 


t=0 t=1 t=j t= 


fori <j S k S z, where if S; is a function of x, successes (t = 7,7, k), then 
the quantity z which appears in the formula for the product is the maximum 
of x; , x; and z, . It should be noted that the product of S; and S; is not equal 
to the value of S; multiplied by the value of S; but is given by the above defini- 
tion. 

(c) Si = [] imo peste. 

(d) We define S;(S; + S;,) to be S;S; + S,;S; , where the (+) sign represents 
ordinary addition. 

(e) S; will represent the product S,S;S; --- S; to r factors, and from (b), 
must be equal [[ {<5 pess.e- [ [721 pesrssce: | [oat qe4sc - 

(f) Then from (a) and (e), S7\S}? will be 


i—1 I— z—l m 


l . : 
Il Pt+1,t II Dm+t+1,t II Pm+n+t+1,t I] Qi+t,i I] Qm+j4 
t=O Tana toad a a 


teceived February 23, 1953; revised October 5, 1955. 
1 These results were included in a dissertation submitted to the University of Kentucky 
in partial fulfillment of the requirements for the Ph.D. degree, June, 1952. 


196 





VARIABLE PROBABILITY DISTRIBUTION 


———— ° ii . 8a ° . 
To illustrate this definition, we consider S}S3, which is 


z—1 3 


1 2 1 
I] Pt+1,t I] P3+t+1,t IT P4+t+1,t I] 2+t,2 I] 3+3+t,3 
==2 t= t= = 


t=0 


z—1 


Pi,0 P2,1 Ps,2 Y3,2 Ya,2 Y5,2 97,3 I] Pt+s,t- 
t= 


We note that the multiplication of these symbols as defined follows the laws of 
positive integral and zero exponents. 

Lemma 1. The probability P(n, x) can be expressed as a sum of products of S’s 
for all n and x, such that n = x. 

Proor. Let us consider x successes and n — z failures in the following speci- 
fied order. Suppose we have ap failures, then a success; a; failures, then a second 
success; a» failures, then a third success; etc.; finally, the zth success and then 
a, failures, where ap + a; + a, + --- + a, must equal n — zx. By theorems from 
elementary probability theory, the probability of x successes and n — z failures 
in this specified order is 
ao a;-1 


ai 
II Qt.0 Pay+1,0 II Jao+t+1,1 Dant+ai4+2.1 °°” II Qay+a,+-+*+az_2+t+2—1,2—1 
t=] 


t= 


* Dant++*+az—1+2,2—1 II Qaot+:+*+az_1 Ht+2,2 9 


t= 


which we may write in terms of our defined symbols as Sp°Sy'Sz? --- S*. Since 
P(n, x) is the sum of terms such as this, it can be expressed as a sum of products 
of S’s for all n and z such that n 2 z. 


3. Development of P(n, x). Let us consider the following partial difference 
equation 


(1) P(n, 2) = m1 9P"’(n — 1,2 — 1) + MoP’(n — 1, 2), 


where P”(n — 1, x — 1) represents the probability of z — 1 successes inn — 1 
trials with the probability of success on the first of the n — 1 trials being po, , 
and where P’(n — 1, x) is the probability of x successes in n — 1 trials with 
the probability of success on the first of the n — 1 trials being po» . The bound- 
ary conditions for this equation are P(n, x) = 0 for x < 0,2 > n, and P(0,0) = 
1. Using the generating function G(x, 6) = a P(k, z)6*, one may obtain, 
under the given boundary conditions, a difference equation involving generating 
functions. From (1) we have that 


> Pk, 26 = DY proP(k — 1,2 — 16° + Dd noP'(k — 1,26, 
k==z 


kor k==x 


which in turn gives 


2) G(x, 6) = 1,98G” (x — 1, 0) + qi 8G(z, 8). 





198 RURIC E. WHEELER 


Now let us make use of the displacement operator 2. Considering p;,; as a 
function of 7 and 7, we will use EZ operating on p,,; to be pi41,j41. From the 
properties of Z as given in [2], 


ti—1 z—l1 
E(S,) E (I Pt+1,¢ II Pit+2,t ass) ’ 
t=0 tat 


: z 
E(S;) II Pt+1,t II Pt+2,¢ Vi+2,i41 - 
t=] t=i+1 


y y i z Y . Oo: 
Thus, p:oE(S;) = [[ ino pesi.e [[inisspese.e Q42,041 = Sig, where if S,is a func- 
tion of x successes, then S;.; is a function of x + 1 successes. Likewise, 


70H (S,S; oo S,) - Sin S541 ar So41 


and 0H (a;S; + a;S; +--+) = aSiar + ajSju. + --- , where the a’s are 
constants. 

We will now simplify difference equation (2) by showing that G”’(z, 6) = 
EG(z, 0) and q:.0G’(x, 6) = SoG(x, 6). Since these generating functions are power 
series in 6, it will be sufficient to show that coefficients of like powers of @ in 
each equation are equal. P”(n, x) has exactly the same form as P(n, x), but has 
the subscripts of each p and q increased by one (i.e., each p;,; in P(n, x) is piss, jas 
in P(n, x)). From the definition and the properties of FZ, it is evident that 
EP(n, x) = P”’(n, x). Thus, since P(n, xz) and P”(n, z) are the coefficients of 
6” in G(z, 6) and G” (z, 6), respectively, we have EG(z, 6) = G”(z, @). 

In a like manner, from the properties of S and the definition of P’(n, x), it 
follows that S, multiplied by P(n, x) equals q:oP’(n, x) and hence SyG(z, 6) = 
q0G’ (x, 4). 

By making these substitutions, difference equation (2) becomes 


G(x, 0) — SoG(a, 0)6 = mri ~EG(x — 1, 0)0 


or (1 — S@)G(z, 6) = EG(x — 1, 0)@. Multiplying both sides of this equation 
by 1 + So@ + Soe + --- , we obtain 


G(x, 0) = (1 + So? + So@ ---)EG(x — 1, 00. 
Then, using 1/(1 — So) to represent 1 + So? + Soe’ + --- , we have 


6 


¥(; = = 
(3) G(x, 0) = i= 


EG(x — 1, @). 
Since G(0, 6) = 1/(1 — Sof), the solution of (3) becomes 


a 
(4 G(z, 0) = . 
) t, ) (1 — So#)(1 — 8,0) --- (1 — SA) 


For ordinary multiplication, it is shown on page 313 of [3] that the coefficient 
of 6” in an expansion similar to this one is the zth divided difference of Sj, a 





VARIABLE PROBABILITY DISTRIBUTION 199 


polynomial of degree n — z in the S’s. Since our function S satisfies the laws of 
multiplication for positive integral and zero exponents, the coefficient of 6” can 
thus be expressed as the zth divided difference of S> . Or the probability of x 
successes in n trials is P(n, x) = A*S> , where the symbol A is the divided dif- 
ference. 

It is of interest to see what happens to this expression when the probability of 
success on a single trial depends only on the number of previous successes. Since 
P(n, x) = A*S¢ , any term of this polynomial may be written as Sp°Sf' --- Sz’, 
where the n, may take on values 0, 1, 2, --- , m — x. Since our p;,; is now re- 
stricted so that it depends only on the number of previous successes (omitting 
the first subscripts), each term becomes pop: --- P2190 qi! --- gz”, or P(n, x) = 
Pop: *-* Pz+4*qo. This is the expression for P(n, zx) that was obtained by 
Woodbury in [3]. 

By specifying the exact law by which the probability of success on a single 
trial changes from trial to trial, we may obtain probabilities that determine 
various desired distributions. As an example, let the probability of success on 
the first trial be expressed as p/1, and let the numerator of this fraction be in- 
creased by \ for each success and the denominator increased by \ for each trial. 
For this distribution, we obtain, from Definition 1, 


(r — 1a 
(r — 1)A 


' §$¢neleGsin 40 4er—H 


and 


= Bi 


gr p p+ , pt+@—DA_ qgqtr>, at 
. 1i+A 1+ 


gt — PP+> 2th pte 
“11+, 1+(@—1)dA1+ (r+)a 
p+ (x — 1)d q q+ at — 1a 


i+¢(@®@+2—-lal+al+G+)DA 14+64r—I1~r’ 
Since So is equal to Sj fori = 1, 2,3, ---,z — 1, and since A*S> has Cr terms, 
then P(n, x) = CPS’, or 


cn P(p + A) «++ (p + [ze — WAG) + ») --- @+ In — x — ID) 
; (1 + [n — 1A)! 


which is the probability of exactly x successes in n trials for the Polya distribu- 
tion as given in [1]. 


REFERENCES 
{1] W. Fever, Probability Theory and Its Applications, John Wiley and Sons, New York, 
1950, p. 128. 
[2] C. Jonpan, Calculus of Finite Differences, Chelsea Publishing Co., New York, 1947, 
pp. 5 and 18. 


{3] Max A. Woopsury, ‘On a Probability Distribution’, Ann. Math. Stat. Vol. 20 (1949), 
pp. 311-315. 





HERMAN RUBIN 


UNIFORM CONVERGENCE OF RANDOM FUNCTIONS WITH 
APPLICATIONS TO STATISTICS! 


By Herman RvuBIN 


Stanford University? 


0. Introduction and Summary. In many statistical problems, we obtain func- 
tions of both the random variable and the parameters involved, from whose 
asymptotic behavior we may deduce the asymptotic behavior of certain esti- 
mates. In many of these cases, it is sufficient to demonstrate uniform converg- 
ence with probability one of these functions. In this paper, a set of sufficient 
conditions for this is given, and we show how these results may be applied to 
some statistical problems. 


1. Statement of the theorem.’ Let X,,--- ,X,,--- be a sequence of in- 
dependent and identically distributed variables with values in an arbitrary space 
X. Let T be a compact topological space, and let f be a complex-valued function 
on T X X, measurable in z for each ¢ ¢ T. Let P be the common distribution 
of the X; . 

THEOREM 1. /f there is an integrable g such that | f(t, x) | < g(x) forallte T 
and x ¢ X, and if there is a sequence S; of measurable sets such that 


P(X - Uz S;) = 0, 
and for each i, f(t, x) is equicontinuous in t for x ¢ S; , then with probability one, 


~ 2st, Xx) > f f(t, 2) dP) 
uniformly jor t ¢ T, and the limit function is continuous. 

We may assume the S; are monotonically increasing. Let « > 0 be given. Then 
forsomei, fx_s, g(x) dP(x) < ¢/5. 

Since f(t, z) is equicontinuous in ¢ for xz e S; and T is compact, there exist 
t,, --+ , t, and open subsets N,, --- , N, of 7 such that U4_, N; T,t;eN;, 
and for ¢¢ N;and zeS,;, |f(t, x) — f(t;,xz)| < ¢/4. Let Vu I(t; , Xa); 
Zi g(Xz), Xe 2 Si; Z GC. wath. 

By the strong law of large numbers, we may select an N such that, if 
A; = &(Y,.) and 6 > 0, 


- l i | ; 
P(for some n > N, a Yj, — A;| 2 /4) < 6/2¢, game |, )+.0s Se. 


TL kel 


Received November 8, 1952. 

! Research performed under an ONR contract. Presented to the Institute of Mathe- 
matical Statistics, June 15, 1951. 

2 Present address, University of Oregon. 

> An essentially equivalent theorem was proved by Le Cam in [2]. 





UNIFORM CONVERGENCE 


P (for some n > N, 2, ml ee 4) < 6/2. 


N kel s 


But ifteN;, | f(t, Xz) — Yae| < €/4 + 2Z,. Hence, 


. . 1 n . : ’ 
P (for some n > N and some t,t ¢ N; and — > ft, X,) — A;' = c) < 6. 


TL kewl 


Therefore, 1/n ) i f(t, X,) converges uniformly to a continuous function with 
probability one. By the strong law of large numbers, that function is 


/ f(t, 2) dP(z). 


2. Applications. As an application of this theorem, we see that the sample 
characteristic function converges to the population characteristic function uni- 
formly with probability one in any bounded interval, since f(t, z) = e'” satis- 
fies the conditions of the theorem. 

It may happen that log L(x | 6) = f(z, @) satisfies the conditions of the 
theorem. For example, for the multivariate normal, the Puisson, Cauchy, x’, 
double exponential, and many other distributions, we are led to the almost cer- 
tain convergence of maximum likelihood estimates to the true values if the 
parameter is restricted to a compact set. 

More difficult estimation procedures can also be shown to be consistent. For 
example, consider a problem of Reiersg] [4]. The model is 


a= & cosa + uU;, 
yi = &sina+v,, 


where wu; and v; have a joint normal distribution, &; is not normal, and &; , (uw; , v;) 
are independent. Let p = ¢ sin 8, o = —t cos 8, and let 


o(t, B, X;) = eit", 


Then 1/n >.?-1 ¢(t, 8, X;) > W(t, 8) uniformly with probability one for ¢ in any 
finite interval. Let 


x(8) = | | v(2t,6) — ve, Bw(—t, 8) Panto, 


where is a bounded monotone function, such that for any « > 0, 
A(e) — A(—e) > O. Then x is a periodic function of period 7, and x(8) = 0 
only for 8 = a + kn. Let 


l< . 
Vall, B) _ n a g(t, B, X;), 
j= 


xn(8) = / | ¥n(2t, 8) — Wilt, B)Wal—t, B) |? dr(Z). 


© 





202 HERMAN RUBIN 


Since y, is bounded, it follows that x,(8) — x(8) uniformly with probability one. 
Hence, if 6, minimizes x,(8), it follows that b, — 8 with probability one, in the 
sense of convergence mod z. 

This result is stronger than that obtained by Neyman about Reiersgl’s prob- 
lem. The method can also be extended to Neyman’s extension of the problem [3]. 

We can, in fact, obtain some very strong results on the existence of consistent 
estimates. 

THEOREM 2. Let ¥ be a family of distributions on Euclidean n-space. Let w be 
a continuous function mapping § into the topological space ©. Then there exists a 
sequence p, of functions on Euclidean kn-space to @ such that if X,, --- , Xe, --- 
are independently distributed with distribution function Fes _ then 
limysco Pe (X1, -+* , Xe) = w(F) with probability one. 

In other words, any continuous parameter is consistently estimable. The con- 
verse is not true, since moments, which are not continuous parameters, are con- 
sistently estimable. It is an unsolved problem, which functions z of a distribu- 
tion in a family $ are consistently estimable—even whether the topological 
structure of the family ¥ and the topological properties of the function x are 
sufficient to characterize consistently estimable parameters 7. 

Let us proceed to the proof of the theorem. 

Let \ be a non-negative finite measure on Euclidean n-space such that every 
open set has positive measure. For each F e¢ §&, let 


V(i,-°:,t,F) = 8 (exp iD ba °), 


j=l 


i.e., W(t, «++ , t,, F) is the characteristic function of F evaluated ati, --- , t,. 
Similarly, let 


7 ; 1 k . ; \ 
Vill shad ata, Xa; 7a » Xe) _ + Dep (i > 6Xw). 
UV h=l j=l 
Then define 
pi(X1, +++, Xx) 
= inf lvl, a he 5 ay **4 ee — V(t, +: jhe kD ?-dd(th, -°- 7 
Fey 


and let p, be any function such that for every X,, --- , X; there is an F, e F 
satisfying 


Pl vel, -+ tay Xap Xe) — W(t, -++,tn, Fr) ? dh, -*: 


2 . — 1 
< pi(X1, +++, Xe) + E 





UNIFORM CONVERGENCE 203 


From Theorem 1, we see that Y(t, ---,¢t,,X1, +++, Xx) approaches 


v(t, --:,¢t,, F) uniformly with probability one for 4, --+ , t, in any bounded 
set. Therefore, 


[vali -++5t0,X, »++, Xx) — Wh, ---,t,, F) ? dX(t, eet) 


approaches zero with probability one. Hence, 
[ ly(t, eee a aa V(t, arene sina? |? dd(tr , Aye oles 


approaches zero with probability one, and thus F;, — F almost certainly. The 
result then follows from the continuity of 7. 

Similar results can be obtained in the case of a continuously identified param- 
eter. If a structure S generates the distribution F, we may ask whether a func- 
tion ¢ defined on the space § of structures is determined by the distribution F. 
If so, we say [1] that ¢ is identified at F. 

Let us formulate the preceding definition without regard to the structure S 
We obtain for each F in a class § of distributions, a non-null set @(F) in the 
parameter space. The condition that ® is identified at F then becomes that 
#(F) has one element. 

Let us say that © is continuously identified at F with respect to § if for every 
sequence F,, --- , F,, --- of distributions of § such that F, — F, and for any 
sequence 6, --- ,6,,-°-- such that 6, ¢®(F,) for all k, 6, converges to the 
one element of ®(F). 

Then by a method similar to that of Theorem 2 we obtain 

TuHeEoreM 3. Let 5 be a family of distributions on Euclidean n-space and let ® 
map § into the set of non-null subsets of ®. Then there exists a sequence p, of func- 
tions on Euclidean kn-space to 8 such that for any F ¢ F, if X1, +--+ , Xe, --++ are 
independently distributed with distribution function F and © is continuously iden- 


tifiable at F with respect to F, then p,(X1, --- , X~) approaches the element of ®(F) 
with probability one. 


REFERENCES 

{1] L. Hurwicz, ‘“‘Generalization of the concept of identification,’’ Statistical Inference in 
Dynamic Economic Models, John Wiley and Sons, New York, 1950, pp. 245-257. 

[2] L. Le Cam, “On some asymptotic properties of maximum likelihood estimates and re- 
lated Bayes’ estimates’, University of California Publications in Statistics, Vol. 
1 (1953), pp. 277-300. 

[3] J. NeymMan, ‘‘Existence of consistent estimates of the directional parameter in a linear 
structural relation between two variables,”’ Ann. Math. Stat., Vol. 22 (1951), pp. 
497-512. 

[4] O. Rerers¢., ‘Identifiability of a linear relation between variables which are subject to 
error,’’ Econometrica, Vol. 18 (1950), pp. 375-389. 





204 ABSTRACTS 


CORRECTION TO “ON THE MAXIMUM NUMBER OF CONSTRAINTS OF 
AN ORTHOGONAL ARRAY” 


By EstHEeR SEIDEN 


Northwestern University 


The proof of Lemma 2 of the paper mentioned in the above title (Ann. Math. 
Stat. Vol. 26 (1955), pp. 132-135) is incorrect. The number 20 on top of page 134 
should be replaced by 15 and hence no contradiction has been reached with 
nis = 45. Fortunately the assertion made in the above mentioned remains valid. 
The last seven lines of page 133 and the first two lines of page 134 should be 
deleted and replaced by the following: 

This means that every 4-rowed orthogonal subarray must satisfy the equality 
nis = 1, contrary to Lemma 1 of the paper “Further remark on the maximum 
number of constraints of an orthogonal array” (to appear in the December issue, 
Ann. Math. Stat. Vol. 26 (1955), which asserts that no such array exists. 

I wish to thank W. S. Connor for pointing out the mistake in my former 
proof. 


EE 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the New York meeting of the Institute, December 27-30, 1955) 


1. The Midrange of a Sample as an Estimator of the Population Midrange, 


Pau. R. Riper, Wright-Patterson Air Force Base. 


A study is made of the distribution of the midranges of samples from five different 
symmetric populations of limited range, and of the relative efficiency of midrange and mean 
in estimating the population midrange, or mean, or median. It is found that the midrange is 
more efficient than the mean for all of the populations considered, and that this efficiency 
increases as the standardized fourth moment decreases. 


2. Distribution of the Product of Maximum Values in Samples from a Rec- 
tangular Population, Pau R. Riper, Wright-Patterson Air Force Base, 
(By Title). 


The distribution of the product of maximum values in samples from a rectangular dis- 
tribution is derived. Results are obtained for the case of two samples of different sizes and 
for k samples of the same size. 


3. A Note on Non-Recurrent Random Walks, Crrus Derman, Columbia 
University, (By Title). 


Let {X;}, i = 1, --- , be a sequence of independent and identically distributed random 
variables with density function f(z) and EX; = \ > 0. Let {S,}, n = 1, --- , be the se- 
quence of cumulative sums S, = Zi, X;, H(z) = D-1 P(S, < xz), and h(x) = H’(z). 
Let A be any Borel set of the positive real numbers and m(A) denote its Lebesgue measure. 





ABSTRACTS 205 


Chung and Derman (to appear in the Pacific J. Math.) proved that (I) #(A) = P(S,¢€ A 
infinitely often) = 0 if m(A) < , and (II) that ®(A) = lif m(A) = ©, provided that as 
z+ «,0 < lim inf h(x) S lim sup h(x) < «. The following theorem was proved: (i) If 
A’ < » and lim sup h(x) < ~, then lim inf h(x) > 0 and consequently (I) and (II) hold. 
(ii) If lim inf h(x) > 0, then (1) holds. (iii) If \ < «, and if there exists a constant a > 0 
and an interval (a, b) such that f(z) = a for ze (a, b), then lim inf h(x) > 0. 


4. Statistical Spectral Analysis, I: Consistent Asymptotically Normal Estimates 
of the Covariance Function and Spectral Averages, EMANUEL PARZEN, 
Columbia University, (By Title). 


Let the wide-sense stationary time series z(t) have mean m, covariance R(v) = E z(t) 


xt v) — m?, spectral distribution function F(w) such that R(v) = [ow dF (w), and 


spectral density function f(w) = F’(w). The problem of statistical spectral analysis is to 
estimate these quantities on the basis of an observed sample. We shall be especially con- 
cerned with finding consistent and asymptotically normal estimators, for both continuous 
and discrete parameter processes, under the following assumptions: (1) R(v) is absolutely 
and square summable; (2) the process y(t) = z(t) — mis stationary of order 4; (3) the non- 
Gaussian part of the fourth moment, or the fourth cumulant, Q(v1 , v2 ,v:) = E y(t) y(t + 01) 
y(t + ve) y(t + v2) — R(vi) R(ve — v3) — R(v2) R(vs — vi) — Rvs) R(vi — v2) is absolutely 
summable; (4) there is an absolutely integrable function g(w: , w2 , ws) such that Q(v: , v2 , vs) 


= Il dw; dw, dw; exp t [wivi + wave + wavs] g(wr , w2 , ws). Examples are given of processes 


satisfying these assumptions; they are examples of multilinear processes. Given z(t), for 
0stxsT (orfort=1,--- , T inthe discrete case), define the sample mean mz , the sample 
covariance R7(v), the sample spectral density (or periodogram) fr(w), and sample spectral 


T T—\|v| 
averages J7(A) by Tmr = | a(t) dt, TRr(v) = / [x(t) — mr] [x(t + |v |) — my], 
0 0 


Rr(v) -/ e*** fr(w) dw, J 7(A) -/ A(w) fr(w) dw for suitable A (w). Expressions are obtained 


for the limit, as T — ~, of TE(mr — m)?, TE | Rr(v) — Riv) |*, and E | Jr(A) — J(A) |?, 


where J(A) = | A(w) f(w) dw. 


5. Statistical Spectral Analysis, II: Asymptotic Mean Square Error of a Class 
of Estimates of the Spectral Density, Emanuet Parzen, Columbia 
University. 


It is well known that the sample spectral density (or periodogram) fr(w) is not a con- 
sistent estimate of the spectral density. A class of consistent estimates may be found in the 
following way (where we write out the formulas only for the discrete parameter case, noting 
that similar statements hold for the continuous case). Define f7(w) = (47) X Zi v\s7r e-*** 
k(Brv) Rr(v), where Bris a sequence of constants such that Br > 0 and TB7y— ~ asT — ~, 
and the kernel k(u) is defined for all real u as an even, bounded, square integrable function, 


T 
with Fourier transform K(w). It is assumed to satisfy k(0) = 1, and [ | k(u) | du 
—T 
MT'~: for some ¢ > 0 and constant M. Various estimates that have been proposed for the 
spectral density (Bartlett, Daniell, Grenander, Tukey) may be regarded as special instances 
of f-(w). To study the properties of f-(w), and to form a theory of the optimum estimate of 





206 ABSTRACTS 


. . -* . 
this form, we need to know the mean square error E | f7(w) — f(w) |?. An asymptotic ex- 
pression for it can be obtained from the following two theorems. 


Theorem I: TB po f7-(w)] > | f(w) |? / k2(u) du {1 + 6(0, w)} where 5(0, w) = 1 or 0 


according as w = 0 or ¥ 0. 
Theorem II: Let r > 0 be such that = | v |r | R(v) |S", and k™ = limyso (1 — k(u))/ | u |r 
is finite. Then Br~** | E fr(w) — f(w) |? | kK fw) |? wheref™ (w) = (4x) Ze-* | v |r R(v). 


6. A Central Limit Theorem for Multilinear Stochastic Processes, EMANUEL 
Parzen. Columbia University, (By Title). 


A definition of a multilinear process is given. Intuitively, a stochastic process z(t), 
defined for all real ¢, is said to be multilinea: if it arises from a process with independent 
increments by means of passage through a finite bank of “linear filters’? and ‘“‘polynomial 
law instantaneous devices’’. Many physically observed stochastic processes may be as- 

T 


sumed to arise in this way. Let S7 = | a(t) dK(t) and Sy = 2% 2z(t,-1) a, , for some se- 
“0 


quence of points t, — «, constants a, , and weighting function K(t). Conditions are given 
in terms of moments, in order that the normalized random variables (Sz — ESr)/eSr 
and (Sy — LSy)/oSy tend in distribution to a normal law with zero mean and unit variance. 


7. An Extension of Cramér’s Theorem 20.6 to Random Functions with Values 
in a Metric Space, EMaNvEL Parzen, Columbia University, (By Title). 


Ret (Q, @, P) be a probability space, and let (R, p; ®) be a metric measurable space, by 
which we mean that R is metrized by p, and @ is the minimal o-field over the open sets in 
@. A random function X on © to R is @-measurable if @ contains the inverse image under 
X of every set in ®; then X generates a probability Py on ®. For a sequence of @-measur- 


able functions X, , define X,, to converge in distribution to X (denoted X, — X) if, forevery 
bounded real-valued @-measurable function f(z) on R whose set of discontinuities is of 


P,-measure 0, (*) [10 dP = [4 dP. By a result of P. P. Billingsley (Ph.D. thesis, 


D 
Princeton, 1955; Th. 1.1), X, — X if, and only if, (*) holds for every bounded uniformly 


continuous function on R. Note also that, if X is a constant, then x.2 X if, and only if, 
p (X, , X) — 0 in probability. Extension of Theorem 20.6 in Cramér (Mathematical Methods 
of Statistics, Princeton, 1946): Let X, , X be @-measurable functions to (R: , p: ; ®:) and 
let Y, , Y be @-measurable function to (Rz , pz ; ®2). Let R be the Cartesian product 
of R, and R; , and let p be a metric on R which agrees with the metrics p; and pz , and such 


D 
that (X, , Yn), (X, Y) are @-measurable to (R, p; ®). Suppose X¥,— X, y,2 Y, and Y 


D 
is a constant. Then (X,, Yn) (X, Y). Proof. It suffices to show | 1(X. , Y.) dP— 


[scx Y) dP for any bounded uniformly continuous function f(z, y) on R. Clearly, 
[1c Y) dP = [ 1, Y) dP. Let 8(e) be such that | f(X,, Yn) — f(Xn, Y)| < ¢ for 


p2(Yn, Y) < &(e). since [ |\f(Xn, Yn) —f(Xn, Y)|dP S e+ Pipel¥n, Y) > 8(€)}, the 


desired conclusion may now be inferred. 





ABSTRACTS 207 


8. Orthogonality and Fractional Replication of Factorial Experiments, ALLAN 
BrrnBaum, Columbia University. 


A simple characterization of orthogonal factorial designs is derived from the condition 
for orthogonality of appropriate vector subspaces of the sample space. This leads naturally 
to: (a) the definitions of various classes of orthogonal designs, some of them standard (e.g., 
Latin squares) and some less familiar (e.g., ‘‘Latin rectangles’’); (b) some lower bounds on 
the fraction of replication which is consistent with orthogonality; (c) some elementary 
methods of construction of orthogonal fractional replicates, which in some cases can be 
shown to consist of a smallest possible fractional. Examples of such fractional replicates, 
including cases of factors at unequal numbers of levels, are given. 


9. On the Second Sample Size Function of a Bayes Two-Stage Test for the 
Mean, Morris Sxisinsky, Purdue University. 


This paper investigates in detail the second sample size function of a Bayes two-stage 
rule that decides between two possible values for the mean of a normal distribution which 
has unit variance. The second sample size is the greatest integer less than or equal to a 
number, j(const. X log (rmn/Wg), W, M), where #(t, y, uw) is, for fixed values of its argu- 
ments, a value of y for which a certain function, U(y, t, y, wu), is absolutely minimum; m 
is the size of the first sample, r,, the value of the probability ratio from the first sample; 
W and g the ratios, respectively, of the simple wrong decision losses and the a priori proba- 
bilities associated with the two possible means; and M is the minimum wrong decision loss. 
Certain monotonicity, symmetry, and continuity properties of 7, and functions related to 
it, are proved, and an asymptotic expression for the function is found when the minimum 
wrong decision loss is large. A subsequent paper, continuing this investigation, will consider 
Bayes two-stage rules having optimum properties with respect to expected overall sample 
size among rules of the same power. 


10. A New Estimation Procedure for a Linear Combination of Exponentials, 
(Preliminary Report), RicHarp G. CorNELL, Oak Ridge National Labora- 
tory and Virginia Polytechnic Institute. 


A new estimation procedure is developed for the parameters of the model yi; = aje™1*i + 
age ati + +++ + aye rti + e;; . The errors e;; are independently and normally distributed 
about mean zero with equal variances. The parameters \, are restricted to be positive and 
the observation points ¢; are equally spaced. The number of points at which observations 
are taken is specified to be an integral multiple of the number of parameters. Also, equal 
numbers of observations are required at each observation point. Estimates are obtained by 
forming as many independent sums from the observations y;; as there are parameters, 
equating these sums to their expectations, and solving for estimates of the parameters. 
A computationally simple, non-iterative solution is found. The resultant estimators are 
not only asymptotically normally distributed, but are also consistent, sufficient and asymp- 
totically efficient. The limiting properties are demonstrated as either the number of ob- 
servation points or the number of observations per observation point grows infinitely large. 


11. A Note on Weighted Randomization, D. R. Cox, University of North 
Carolina. 


The standard methods of randomization used in experimental design consist of selecting 
an arrangement at random from a set S of similar arrangements, giving each arrangement 





208 ABSTRACTS 


in the set equal chance of selection. As is well known this device makes the standard designs 
unbiased in the sense that, under weak assumptions, the randomization expectations of 
linear and quadratic functions of the observations agree with their values as calculated 
from an appropriate linear model with random residuals. It is also known that these results 
do not hold when an adjustment for concomitant variation is made by analysis of covariance. 
In the present paper it is pointed out that a randomization justification for the covariance 
procedure can be provided if weighted randomization is used, i.e. if the arrangement for use 
is selected giving different arrangements in the set S appropriate unequal chances of selec- 
tion. Possible applications are considered briefly. 


12. On the Analysis of Incomplete Block Designs, Marvin ZELEN, National 
Bureau of Standards. 


Let there be 2v normal populations which can be divided into two sets of v populations 
each, such that the unknown parameters of each set are (y; ,o1) and (uw; ,o3) i = 1,2,---,v. 
Consider the null hypothesis Ho: (ui = Ofori = 1,2, --- ,v) against the alternative hypothe- 
sis H,:(u; = Ofori = 1,2, --- , v). If a sample of size r; is made for each of the v popula- 
tions of the jth set (j = 1, 2), then one can test Hy using two independent F-ratios. The 
main problem is to combine the two independent tests of significance into one single test 
having (perhaps) greater power than either of the individual tests. This problem arises in 
the analysis of incomplete block designs where one set corresponds to the intra-block 
analysis and the other to the inter-block analysis. The object of this paper is to show that 
exact statistical tests do exist for combining intra- and inter-block information. Methods 
are discussed for combining the two tests and a comparison of the power function is made 
for particular numerical values of the alternative hypothesis. 


13. A Remark to Wald’s Paper: “On a Statistical Problem Arising in the Classi- 
fication of an Individual into One of Two Groups,” Junsino OGawa, 
University of North Carolina. 


In the paper above mentioned, the late Professor A. Wald proposed a statistic U for the 
use in classification procedure and considered its exact sampling distribution. His result is 
too complicated to describe here. His proof was divided into nine lemmas, and each lemma 
was proved by an ingenious method. But, at least in the author’s opinion, the proofs of his 
sixth and seventh lemmas can be improved with the help of the invariant measure defined 
on the Grassmann manifold, which consists of p-planes in (n + 2)-dimensional Euclidean 
space. The author presents new proofs of these lemmas as an example of applications of 
the theory of orthogonal group manifolds developed by A. T. James in 1954 (Ann. Math. 
Stat., Vol. 25). 


14. Consistency and Optimum Properties of Some Two-Sample Tests, JuLius 
R. Buvum, Indiana University, and Lionei Wetss, University of Virginia. 


Let X: , --- , X» be a sample from the uniform distribution on the unit interval and let 
Y;, --- , Yn be a sample with density g(y) on the unit interval. Let Zo = 0, Zn4: , and 
Zi < +--+ < Z, be the order statistics corresponding to Y; ,--- , Y,. Foreachi = 1, --- , 
n + 1 let S; be the number of X’s in the interval [Z;_; , Z:], and for each non-negative 
integer r, let Q,(r) be the proportion among S, , --- , 8,4: which are equal to r. Let a = 

al 
m/n, and for each r, let Q(r) = a’ \g?(y)/[a + g(y)]"*!} dy. Then it is shown that under 
“0 


mild restrictions on g(y) we have P{lim,... sup,>, | Q.(r) — Q(r) | = 1. This is ap- 





ABSTRACTS 209 


plied to prove consistency of certain two-sample tests such as the Wald-Wolfowitz run 
test (Ann. Math. Stat., Vol. 11 (1940), pp. 147-162). One of these tests is shown to have a 
further desirable property. 


15. Remarks on Characteristic Functions, Eugene Luxacs, The Catholic 
University of America and The Office of Naval Research. 


Let F (x) be a distribution function and denote by ¢(t) its characteristic function (Fourier 
transform). Functions of characteristic functions are studied which are themselves char- 
acteristic functions. The following theorem is established: Let ¢(t) be a characteristic 
function and let G(z) be a function of the complex variable z which is analytic in | z | < R, 
where & > 1. The function G[¢(t)] is also a characteristic function if, and only if, G(z) has a 
power series expansion about the origin with non-negative coefficients and if G(1) = 1. 
The class of functions G(z) which have the property that G[@(t)] is a characteristic function 
whenever ¢(t) is a characteristic function includes also functions which are not analytic, 
for example the function | z |*. By means of the theorem, one obtains also the following 
result: Let ¢(t) be an arbitrary characteristic function and p be a real number such that 
p > 1; then, (p — 1)/[p — ¢(t)] is the characteristic function of an infinitely divisible dis- 
tribution. 


16. The Limiting Distribution of the Serial Correlation Coefficient in the Explo- 
sive Case, Joun S. WuiTE, University of Manitoba. 


An auto-regressive process satisfying the stochastic difference equation 2, = aZi1 + U: , 
(t = 1, 2, --+), where the u’s are independent identically distributed random variables, 
Z is a constant, and @ is an unknown parameter, is said to be explosive if | a | 2 1. If the 
u’s are normally distributed with mean zero, it is shown that the maximum likelihood esti- 
mator for a has an asymptotic Cauchy distribution when | a | > 1. For | a | = 1, a char- 
acteristic function is obtained for the limiting distribution. For a = 1, it is also shown that 
the limiting distribution of the maximum likelihood estimator for a is the distribution of a 
certain functional of a Wiener process. 


17. The Distribution of the Ratio of Two Measures of Normal Dispersion, 
H. O. Hartiey, lowa State College. 


Let us denote by z; (i = 1, 2, --- , n) a random sample of n items from N(0, 1) and by 
# and s? the sample mean and variance, i.e., = n-'Za; ; 8? = (n — 1)“ 2 (a; — £)*. Con- 
sider now, the measure of dispersion ¢ = ¢(z — 2, «++ , Zn — £), where ¢ is a 1st order 
homogeneous function of its arguments z; — 2, and finally u = ¢/s. Special cases of such 
a ratio which have been considered in the literature are: (a) @ = range = Zmax — Zmin 
(David, Hartley and Pearson) Biometrika, 41, 482; (b) @ = max — Z (Pearson and Chan- 
drasekar) Biometrika 28, 308; (c)@ = 1/nz | a; — # | (Geary) Biometrika 27, 310, 353. Here 
we develop a general distribution theory for the ratio u based on a fundamental integral 
equation: Introducing the probability integrals ¥(U) = Pr{l/u S$ U};G(U) = Pr{1/¢ Ss U} 


it follows from the independence of u and s that G(U) -/ F(Us) f,(s) ds where f,(s) 
0 

is the ordinate distribution of s based on » = n — 1 degrees of freedom. Given the known 

integral G(U) equation (1) is an integral equation Fredholm Ist kind for the unknown 

integral F(U). Various methods of solving this equation are discussed and applied to cases 

(a) and (b), supplying answers unobtainable by the methods hitherto employed. Also, 

(d) @ = (2 (ai — 241)?/(n — 1)* (von Neumann; Ann. Math Stat. 12, 367); (e) @ = 


pri 


| ay — Zu. |/(n — 1) (Kamat, Biometrika, 40, 116). 





210 ABSTRACTS 


18. Estimating a Linear Functional Relation, H. Farrrretp Smirx, North 
Carolina State College. 

The problem considered is to estimate a theoretical line & cos8 — && sing — P = 0 
from paired observations 2:;%2; which are assumed to be random vectors from bivariate 
normal distributions around arbitrary centers £1; f; . Most attempts to fit such a functional 
relation to observations with errors in both variates introduce the &; as what Neyman 
and Scott (1947) called ‘‘incidental parameters.’’ These bring troubles to both least squares 
and maximum likelihood formulations. Attention is focused on the condition that the only 
ascertainable quantities from which a solution must be deduced are deviations of observa- 
tions from the line in some prescribed direction. These have univariate distributions whose 
expectations in general deviate from the line by amounts proportional to their distances 
from the respective £,; . But when, and only when, the deviations are measured in one 
particular direction their expectations are zero independently of §,; . By utilizing this con- 
dition, and only thus, the incidental variables can be eliminated from the problem. The 
probability of a sample can then be expressed in terms of univariate normal distributions 
about the line, and maximum likelihood may be applied free of incidental variables. Kum- 
mell’s solution is then seen to be unique and efficient. The estimator of the angle 8 is un- 
biased and its asymptotic variance may be evaluated. With certain supplementary condi- 
tions the exact sampling distribution has been obtained. (Supported by the Office of Ord- 
nance Research). 


19. Asymptotic Distributions of Roots of Certain Determinantal Equations, 
R. GNANADESIKAN, University of North Carolina. 


The tests obtained by Roy for the hypotheses: (i) &: = & = --- , 1.e., 2* = 0 where 
>* is the “between” covariance matrix, and (ii) 2i2(p X q) = 0 where 2. is the covariance 
matrix between a p-set and a q-set (p S q), on multivariate normal populations depend 
on the largest characteristic roots of (i) S*S~' where S* and S are the sample ‘“‘between”’ 
and ‘‘within’ dispersion matrices respectively; and (ii) S2BiGe fs. where Si:(p X p) 
Sx(q X gq) are sample covariance matrices of the p-set and the q-set respectively, and 
Si2(p X q) is the sample covariance matrix between the p-set and the q-set. The exact 
c.d.f. of this largest root has been obtained by Roy. For large sample sizes the problem 
becomes identical with that of finding the c.d.f. of the largest characteristic root of the 
sample dispersion matrix for a sample from one multivariate normal population. This 
limiting distribution has been obtained by Nanda for two particular cases, but there exists 
no explicit and general method of obtaining it. This has been done now. Also considering 
the test of, 2 = ZY» = /(p) in particular, on one p-variate normal population we get Roy’s 
test depending on the largest and the smallest characteristic roots of S(p X p), the sample 
covariance matrix. The joint c.d.f. of the largest and smallest roots has been obtained. 
Explicit expressions for some particular cases have also been obtained, 


20. Investigation of the Possibility of Using Likelihood Ratio Tests of Certain 
Multivariate Hypotheses, for Obtaining Confidence Bounds, R. GNANA- 
DESIKAN, University of North Carolina, (By Title). 


The likelihood ratio tests considered are of the following composite hypotheses on one 
or more multivariate normal populations N[é(p X 1), Z(p X p)): (i) Ho:E = I(p) (one 
population), (ii) Ho: 21 = Zs (two populations), (iii) Ho:& = --- = & (analysis of variance 
of mean vectors for k populations) and (iv) Ho: Zi2(p XK q)(p S gq) = O (where 2y2 is the 
covariance matrix between a p-set and a q-set), the alternative in each case being H # Ho. 
One wants in each case confidence bounds (in terms of the observations) on meaningful 





ABSTRACTS 211 


parametric functions which, as it were, would measure departures from the null hypothesis, 
such functions being (for the different cases) the respective characteristic roots of (i) 2, 
(ii) Z:Z2", (iii) Z*Z— (where =* is the “‘between” and = the “within” dispersion matrix 
of the k populations) and (iv) Li Yi2Fe2 Die(where Zi: and Yee2 are the dispersion matrices 
of the p-set and the g-set and 2,2 has been already defined). While the confidence bounds 
on these parametric functions are already available if one starts from other tests of these 
hypotheses and then inverts, it is found that if one starts from the likelihood ratio tests and 
then tries to invert, the problem of separation of the parametric functions from the ob- 
servations becomes quite difficult. 


21. Asymptotic Efficiencies of a Nonparametric Life Test for Investigating 
Smaller Percentiles of a Gamma Distribution, Joun E. WAutsu, Lockheed 
Aircraft Corporation, (By Title). 


In many life testing situations the quantity of interest is a specified smaller percentage 
point of the statistical population investigated. For example, a substantial loss may be 
incurred if more than a specified small percentage of the items of the population have the 
property of failing too soon. This paper considers some well known nonparametric tests of 
the sign test type and investigates their properties when applied to smaller percentage 
points for the case of a sample from a gamma probability distribution. Asymptotically, the 
nonparametric results are found to be highly efficient compared to the ‘“‘best’’ parametric 
results based on the same fraction of items failed for the case of gamma distributions. 
Intuitive reasoning indicates that this high efficiency property holds for any reasonable 
type of statistical population and any sample size. Appropriate use of these nonparametric 
tests and estimates sometimes can yield a saving in cost and/or time without loss of sta- 
tistical efficiency since the experiment can be stopped when only a fraction of the items 
being life tested are failed. 


22. A Test of Judge Concordance for Paired Comparison Designs, (Preliminary 
Report), J. W. Wiixrnson, University of North Carolina. 


In a recent paper, (to appear in Biometrika), R. C. Bose has given several highly sym- 
metrical designs where each of v judges compares r pairs of n objects, (1 < r S n(n — 1)/2), 
and each pair is compared by k judges, (1 < k S v). To obtain a test of judge concordance 
for these designs, a pseudo-preference matrix P; = (pit), (8, t = 1,2, -++ ,n), is constructed 
for each judge i, (i = 1, 2, --- , v), where pa: = 1 or 0 according as a, is, or is not, pre- 
ferred to a; . The diagonal cells, and cells p;, and pj, when a, and a; are not compared by 
judge i, are left blank. A statistic ’ is defined as 2’ = Zyi;(yij — 1) /2, where summation is 
extended over the non-diagonal cells of P = Zi_sP; , and where 7;; is the entry in cell 
(i, 7) of P. The distribution of 2’ under the hypothesis that the preferences are allotted 
at random has been obtained, and has been tabulated for most of the known designs. Calcu- 
lation of the first few moments of =’ would indicate that a linear function of it is a x? for 
large n and k. For r = n(n — 1)/2, 2’ and its distribution are identical with those obtained 
by Kendall and B. Smith (Biometrika, Vol. 31, 1940). 


23. On the Efficiency of Certain Classes of Tests Based on U-Statistics, Joan 
Ravup RoseEnBiAtTT, National Bureau of Standards. 


A class of non-parametric decision problems is characterized by partition of a set of 
possible probability distributions into sets defined by the value of a functional. Particular 
attention is given to a class of functionals of the form Eo(X; F), where X is a vector ran- 





212 ABSTRACTS 


dom variable with distribution F(z), and especially to the subclass in which the function 
(x) takes only the values zero and one. Certain families of tests are considered, which are 
based on functions of observed values of X which depend on these values only through 
the function ¢(x). One such family is that based on the U-statistic corresponding to the 
functional E¢(X). 

Methods are developed for computing an asymptotic expression for an index of efficiency 
for these families of tests relative to decision problems stated in terms of values of L¢(X; F). 
These methods are applied in particular to comparison of a family of two-sample tests based 
on the Wilcoxon-Mann-Whitney statistic with a family of tests based on a related statistic 
which has binomial distribution. Additional examples are given. (Work done at the Univ. 
of No. Carolina, with the support of the U.S. Air Force.) 


24. The Dynamic Statistical Decision Problem when the Component Problem 
Involves a Finite Number, m, of Distributions, James F. HANNAN, 
Michigan State University. 


The dynamic problem consists in a sequence of N statistical decision problems with 
identical formal structure. Decisions are made successively within components and the 
risk of a sequence decision function is taken to be the average of the risks incurred in the 
components. An earlier paper by Hannan and Gaddum (submitted for an Annals of Mathe- 
matics Study) considered a sequence of formally identical two person S-games under the 
assumptions: (i) II’s risk points in the component game form a closed and convex subset 
of the unit m cube, (ii) Il’s choice of strategy in each component can depend on the e.d. 
(empirical distribution) of I’s moves in prior components. The principal theorem of that 
paper exhibited a usable sequence strategy (not depending on NV) whose average risk across 
the N games exceeds by less than (6m/N)! the single-game minorant risk against the e.d. 
of I’s N moves. The present paper is concerned with the substitution of estimates for the 
successive e.d. in (ii). If mixtures of the m distribution have unique representation, there 
exists a bounded vector kernel for unbiased estimates of the e.d. and the sequence decision 
function obtained by the substitution of unbiased estimates of this type satisfies the 
analogous theorem with the bound on the excess increased by multiplication by a bound 
of the kernel. 


25. On Certain Systems of Experiments as Interdependent Stochastic Processes, 
(Preliminary Report), Davin Rosensiatt, American University, (By 
Title) 


In certain systems of experiments, one may regard behavioral interaction between a 
“‘responsive’’ generalized subject and an experimenter in terms of interdependent discrete 
time-parameter stochastic processes. The subject (2) engages in actions or decisions Ax: ; 
the experimenter (1), on the basis of an estimated conditional distribution of subject’s 
actions, produces stimuli or treatments Ai, , where A,, takes one of the values aj; , ---, au , 
i= 1,2,k = 1,2,--- ; subject and experimenter act alternately. The ith entity (¢ = 1, 2) 
possesses (a) a probability distribution Ry» over a finite set of m; configuration states 
(threshold or preference configurations), i.e., a point in the m;-dimensional simplex; (b) 
parameter stochastic operators (m; X m; matrices, row sum unity) II,;;, 7 = 1, --- , l; 
(c) parameter mapping I; or behavior function (matrix r.s.u.) which takes Ry, into the 
conditional distribution Gi of Ai , Gi, assuming values in the /-dimensional simplex. 
Let f;(Au) = 1 if Aux = ai; , = 0 otherwise. Consider the system: (i) Ri. = Rio ; (ii) 
Riess = Rix(Djar f;(Axe) 1 ;); (iil) Ry = R, k-1 (Z} 1 f;(Ais) Tj); (iv) Gir = Ruy I = 1,2, 
k = 1, 2, --- , where each relation holds with probability one. The ‘‘conjoint process’’ 





ABSTRACTS 13 


Ey = (Axe, Ra , Aue, Riz), k = 2,3, --- , is a Markov process with stationary transition 
function. Conditions are adduced under which lim (Gi, — Ga.) = 0, the null vector, with 
probability one, where Gy is the experimenter’s ‘‘estimate’’ of the ‘labile’? conditional 
distribution of the subject’s actions, G., . This model is constructed within the formal 


system OFK presented at the August, 1955, meeting of IMS (forthcoming abstract in 
Econometrica). 


26. A Spherically-Symmetric Order Statistic 1, (Preliminary Report), Brian 
Guuss and Frep L. Stroprseck, University of Chicago. 


n properties A, B, --- , E are ordered 1, 2, --- , n by m people and scores a, b, --- , ¢ 
are calculated such that g = 2 (ranks of A) — [(n + 1)m/2], and so on. A statistic ris 
defined in the following manner: Consider n lines in an (nm — 1)-dimensional space with 
orthogonal axes (x , «++ , Z,-1), in which each line corresponds to one of the characteristics 
A, B, --- , E. The lines all pass through the origin and are such that the angles between 
all pairs are equal. Points A, B, --- , E are defined on the lines such that the distances 
OA, --- , OE (where O is the origin) are a, b, --- , e, respectively. A point P(X; --- , Xn-1) 
is then obtained such that OX; is the sum of the projections of OA, --- , OF on Oz; . Then 
OP = r. The paper shows that r = J/Sn/(n = 1), where S is Kendall’s statistic [M. G. 
Kendall, Rank Correlation Methods, 1948, Chap. 6.]. By this approach the paper shows it 
is possible by considerations of volumes of the space to test various hypotheses including 
(i) A = B=.-- = E; (ii) A> B> --- > E versus all other contingencies. 


27. Generalized Normalization Polynomials, D. Trtcuroew, University of 


California at Los Angeles and National Cash Register Company, Dayton, 
Ohio. 


Normalization polynomials have been studied by Campbell in 1923, Cornish and Fisher 
in 1937 and Hotelling and Frankei in 1938. Expansions based on these polynomials enable 
one to use the normal integral table for computing probability points of other distributions 
which approach the normal. These expansions have been expansions about the mean and 
the further the variate is from the mean the higher is the error of approximation. This 
paper shows that it is possible to get polynomials for expansions about an arbitrary point. 
These expansions are useful for obtaining probability points for probabilities very close to 
zero or one. Some expansions are obtained for the ¢ and Gamma distributions. (Part of 
this work was sponsored by the Office of Naval Research). 


28. “No Interaction” in a Three-way Table, Marvin A. Kastensaum, Univer- 
sity of North Carolina. 


Let nj; denote the observed frequency, and p;;, the probability of having an observation 
in the (ijk)th cell of a three-way table, (¢ = 1,2, --- ,r;j7 = 1,2,--- ,8;k =1,2,--- ,?t). 
Also let the marginal probabilities 2; pijx = pojx , ete., Die Pijx = Ve Pose = Pojo , ete., 
and 2i.;.c pijk = 1, and Zj.;.n nijx = n. Then, if the marginal frequencies are stochastic 
variates, the condition of independence between ‘‘(ij)’’ and ‘‘k’’ is expressed as: (1) pijx = 


PijoPook ; and the conditions of independence between ‘‘i’’ and ‘“‘k’’ and between ‘“‘j’”’ and 


“k”’ are given as (2) piok = PiooPoor , aNd (3) Pojex = PojoPoor , respectively. A condition of 
‘‘no interaction” is defined as one which, when taken together with (2) and (3), will yield 
(1). This condition is (4) pijx = (qijoqiongoje) /(Gio09ojoGook), Where it is not assumed that 
qijo = Pizo , ete., nor even that gioo = 2; qijo , ete. The q’s in (4) may be eliminated in such 


a fashion as to yield distinct relationships among the p;;,’s, namely: (5) (p,.:p (PistPrjt) = 





214 ABSTRACTS 


(DrexPijk)/(PioePrjz), t= 1, 2,---, (7 — 1); 7 1,2,---, 9-1; k=1,2,---,@— 1). 
If the hypothesis (4) of ‘‘no interaction” is to be tested, then the p;;,’s may be estimated 
by maximizing the multinomial likelihood function, subject to (5) and to the further con- 
straint 2j.;.. pijz = 1. The resulting ;;,’s, expressed in terms of their respective n;;,’s 
and the deviations from expectation, are then substituted into the expression 
ZDi.j.k (Mise — NPs jx)?/NPi jx , Which for large n, is distributed as x? with (r — 1)(s — 1)(t — 1) 
degrees of freedom. 


29. On Bartlett’s Test of Complex Contingency Table Interaction, Susir KuMAR 
Mirra, University of North Carolina. 


Contrary to some of the current beliefs it is shown that the stochastic cubic equation 
suggested to Professor Bartlett (J. Roy. Statist. Soc., suppl. 1935) by Professor R. A. 
Fisher in developing a test of his hypothesis of no interaction in a 2 X 2 X 2 table might, 
with probability approaching one, have all the three roots real, under the null hypothesis 
of no-interaction. In such a case only the real root with the numerically smallest value 
will validate the use of x? test. This could be immediately extended to a general r X s X t 
table. A little thought would reveal that the numerical example considered by Bartlett 
actually represents 4 samples from 4 binomial populations. An attempt has been made to 
interpret interaction and main effects in this case and to furnish suitable tests for the 
suggested hypotheses. 


30. A Theorem in Minimum Chi Square, Susrr Kumar Mirra, University of 
North Carolina, (By Title). 


Let the possible results of a certain random experiment £ be divided into r mutually 
exclusive groups and suppose that the probability of obtaining a result belonging to the 
ith group is pi = pilai , «+ , as) where a = (aj, --- , as) is an inner point of some non- 
generate interval A. We assume that p;(a: , --- , a,) considered as functions of a , --+ , a, 
over A satisfy Cramér’s conditions (a), (b), (c) and (d). (See Cramér: Mathematical Methods 
of Statistics, Section 30.3.) Let fi(ai ,--- ,a,)k = 1,2,--- ,t Ss bet functions of a; , --- , a, 
such that for all points in A, the f; satisfy the following conditions: (e) Every f, has con- 
tinuous derivatives (0f;) /(da;) and (0°f;/(daj;da;); (f) the matrix { (0f,)/(da;)} where k = 1, 
2,---,t,7 =1,2, ---,8 is of rank t. Let i(k = 1, 2, ---, t) be certain numbers in the range 
of the respective f,’s over A. We denote by H the hypothesis f(a; , --- , as) = fx(k = 1,2,---, 
t). Let »; denote the number of observations belonging to the ith group in n actual repetitions 
of Z. Cramér has shown that the modified minimum chi-square equations have exactly one 
system of solutions @ = (@, --- , &,) such that & —a® as n — ~ and such that xi = 

P 
Lia (vi — npi(@))?/np.(@) is asymptotically distributed as a x? with r — s — 1 d.f. The 
following results are proved: If Hy is true (1) the modified minimum x? equations subject to 
restriction H have exactly one system of solutions &y such that @g — a® asn— ~, and 


such that x3, = Din: (v; — npi(@q))?/(npiag) is asymptotically distributed as x? with 
r—s+t—14d4.; (2) xj}, — x} is asymptotically independently distributed of x} as a x? 
with ¢ d.f. This result is analogous to a result in least squares proved by C. R. Rao in his 
book. This also extends a result proved by Neyman (Proc. First Berk. Symp.). 


31. Sequential Estimation from a Finite Population, Hersert T. Davin and 
InGRAM OLKIN, University of Chicago. 


This paper is concerned with sequential estimation of the fraction defective p, in a 
finite (hypergeometric) population. The development is similar to that of Girshick, 





ABSTRACTS 


Mosteller, Savage (Ann. Math. Stat., v. 17, 1946, 13-23), who treat the case of infinite (bi 
nomial) populations. Path count ratios play essentially the same role in both cases, and are 
shown to provide unique unbiased estimates of certain functions of p when the regions are 
simple. Expressions for the variance of the estimate of p are given for both cases, and it 
is shown that for symmetric boundaries the variances in the finite and infinite situations 
are formally similar polynomials in pq of the same degree. A generalized finite population 
correction is discussed and, in particular, boundaries for which the variance is equal to 
cpq are considered. 


32. Tables for Computing Bivariate Normal Probabilities, DonaLp B. Owen, 
Sandia Corporation. 


A table of T(h, a) = 1/(2r) [ exp [—$h?(1 + 2*)]/(1 + 2*) dz, which may be used to 
0 


obtain bivariate normal probabilities, has been computed to be used with a special two- 
dimensional linear interpolation scheme. The function is tabulated in two tables, one 
table having a coarse interval in one of the parameters and an interval fine enough for 
ordinary linear interpolation in the second parameter. The second table has the coarse 
interval on the second parameter and the fine interval on the first. By choosing the four 
points at the coarse intervals of the two tables that are nearest to a value to be inter- 
polated and four other points on the fine intervals, the interpolation scheme gives ac- 
curacies comparable to ordinary linear interpolation with only ten per cent as many entries 
as that required for ordinary linear interpolation. 


33. Bounds and Approximations for Constants Used in Quality Control, J. T. 
Cuu, University of North Carolina and Case Institute of Technology. 


Very close, yet very simple in form, upper and lower bounds are obtained for constants 
a,c: ,6b, A, FE; , and B; ,i = 1, --+ , 4, often used in quality control to set up control charts 
for individual observations and the means and standard deviations of groups of observations. 
For example, let a random sample: 2; , --- , 2, , be drawn from a normal distribution with 
variance o*. If s? = Df; (x; — #)2/(n — 1) where # = Djs 2;/n and a = E(s)/c, then 
[(2n — 3)/(2n — 2)]} S a S [(2n — 2)/(2n — 1)}* for all integers n 2 2. In using these 
bounds and their arithmetic mean as approximations to a, the proportional errors are shown 
to be respectively less than FE = $(2n — 1)(2n — 3) and E/2(E/2 = .004 if n = 5). Similar 
results are obtained for the constants mentioned above. Tables are given for illustration 
(Research partially supported by the Office of Naval Research). 


34. Four Streams of Traffic Converging on a Cross-Road, Brian Guiuss, Uni- 
versity of Chicago, (By Title). 


Four streams of traffic arrive at a cross-road in independent Poisson process. The lights 
operate such that if they have just turned red against two of the streams they will turn 
green again when either (i) n cars are waiting, or (ii) a time a has passed, whichever is the 
sooner. n and @ are prearranged. The frequency function and expectation of the waiting 
time +r of a car wishing to go straight or to turn right are obtained: E(r) = [n(n + 1)/2 — 
e~™4 Dag (n — R + 1)(n + R)(ma)®/2(R)!)/(th + t2)m*, where m = sum of mean flows of 
the two streams for which the lights are red, and ¢; and tz are the expected red time-periods 
for the two sets of two streams. The frequency functions of the waiting times wu, v of a car 
arriving in a green or red period respectively and wishing to turn left, and therefore having 





216 ABSTRACTS 


to wait for a sufficient time-gap in the opposing stream or until the lights turn red again, 
are found. E(u) and E(v) are then calculated for some sets of parameter-values by ap- 
proximate integration. 


35. Markov Processes Arising in Learning Models, Joun G. Kemeny and 
J. L. Snex~i, Dartmouth College. 


The paper studies two learning models, one due to W. K. Estes, and the other due to 
R. R. Bush and F. Mosteller. In the cases studied both models lead to one-parameter 
families of Markov processes; the Estes model having a finite number of states, the Bush- 
Mosteller model an infinite number. For each value of the learning-parameter there is a 
single Bush-Mosteller process, but an infinite number of Estes models—one for each pos- 
sible number of states. It is shown that for a given value of the learning-parameter, as the 
number of states tends to infinity, the stationary distribution of the Estes processes tends 
to the stationary distribution of the corresponding Bush-Mosteller process. Moments of the 
stationary distribution are found in the Bush-Mosteller processes, and the distributions 
themselves are also found in several special classes of processes. It is shown that as the 
learning-parameter tends to zero the stationary distributions in both models approach 
very simple distributions. Since some psychologists are interested primarily in low values 
of the learning-parameter, this result provides simple approximate answers. 


36. On a Decision Rule for Selecting a Group Containing the Population with 
the Largest Mean, (Preliminary Report), R. C. Boss and 8. S. Gupta, 
University of North Carolina, (By Title). 


Suppose there are (n + 1) normal populations N (yu; , o**), 7 = 0,1, 2, --- , n, and that 
Zo , 21 ,-** ,L, are the (n + 1) means based on samples of equal size k, one from each popula 
tion. One would like to select as small a group as possible subject to the restriction that the 
least upper bound of the probability of not including in the group the population with 
the largest mean is a()) < a < 1). K. C. Seal (Ann. Math. Stat., Sept., 1955) has given an 
infinite class of decision rules for this problem and has obtained an optimum rule for the 
situation when all means but one are equal. Another rule has been studied here in detail. 
This is based on the auxiliary statistic u = (Yq) — Yo)/s, where Yo, Yi, --- , Yn are 
independently and identically distributed N(0, o*), o? = o**/k, corresponding random 
observations y: , Yz, --* , Yn being ordered as yu) < ya) < ++: < Ym) and where s? is an 
independent estimate of o?. The rule is, ‘‘Reject any observation z) from the given 
ai(i = 0,1, 2, --- , n) if ny — Zo > 8Uq and retain otherwise; where 21) < 2a) < +++ < Ln) 
are n ordered observations among (x , 22, «+ , Zn) and uw, is the upper a % point of 
(Yun) — Yo)/s.’’ The upper 5 % points of U, in the case when o is known, have been tabu- 
lated. 


37. Recurrent Values of Sums of Independent Random Variables, (Preliminary 
Report), Louis J. Core and Henry Tetcuer, Purdue University. 


Let {X;} 7 = 1,2, --- be a sequence of independent random variables defined on a proba- 
bility space with Snin = Diem Xi, Sin = S, and P, = P{| Sain — b| < €i.o.} where i.o. 
signifies ‘‘infinitely often’’ (i.e., for infinitely many values of n 2 m). The real number b 
is called recurrent or quasi recurrent for the sequence {S,,,,} according as P, = lor P, >0 
for all « > 0. The classes of such values are examined and conditions for the existence of 
recurrent or quasi-recurrent values are considered. 





ABSTRACTS 217 


~ 


38. A Problem Involving the Distribution of Shadows, (Preliminary Report), 
HERMAN CuHERNOFF, Stanford University, and Joseru F. Day, National 
Bureau of the Census. 


A source of light is at a point P and a worm is crawling in a given direction along a line 
L which does not go through P. Circular disks are distributed randomly throughout the 
plane containing P and L. Suppose that the worm can travel only in the shadow. The 
distribution of the distance the worm can travel from a given starting point is character- 
ized. One such characterization involves the wave equation. The results generalize to the 
cases where the disks are replaced by line segments parallel to L and the source of light is 
at infinity and these results have applications to geiger counter and traffic problems. 
The corresponding problems of the worm who travels only in light are rather easy to treat. 


39. Note on Two-stage Test Procedures, 8. G. Guurye, Lucknow University, 
(By Title). 


This note concerns tests of hypotheses regarding a parameter which are designed to have 
power independent of another parameter. The conditions satisfied in the problem of the 
mean of a normal distribution solved by Stein (Ann. Math. Stat., 1945) are stated more 
generally, and the corresponding general solution is given. It is shown that these condi- 
tions are also satisfied in the problem of testing for the location parameter of an exponential 
distribution by a number of two-stage tests, and the performances of some of these tests are 
compared in some particular cases. 


40. Some Properties of Generalized Sequential Probability Ratio Tests, Jack 
C. Krerer, Cornell University, and Lionet Wess, University of Vir- 
ginia. 


Generalized sequential probability ratio tests (GSPRT) are known to form a complete 
class with respect to the probabilities of making errors and the distribution of the sample 
size, when one simple hypothesis is being tested against another. In this paper it is shown 
that (1) under certain conditions, a GSPRT is uniquely determined by the distributions 
of the sample size under the two hypotheses; (2) for a GSPRT to be admissible with respect 
to the probabilities of error and the distribution of the sample size, the decision bounds 
characterizing the test must obey certain inequalities; (3) under certain monotonicity con- 
ditions on the probability ratio, a GSPRT forms a complete class with respect to the proba- 
bilities of error and the “‘average’’ distribution of the sample size (averaged over a set of 
alternatives to the two hypotheses being tested); and (4) a class of tests complete with 
respect to the probabilities of error and the expected sample size under a third distribu- 
tion consists of truncated GSPRT whose decision bounds satisfy certain inequalities. 


41. Sequential Decision Problems for a Class of Stochastic Processes. Testing 
Hypotheses, (Preliminary Report), A. T. Buarucnua-Rem, University 
of California, Berkeley. 


Let {X,(t), t 2 0}, 7 = 1, 2, be two different stochastic processes with continuous time 
parameter. Beginning at t = 0, a process {| X(t), t 2 0}, which is either {|X,(¢)} or {|X2(t)}, 
is observed continuously, and on the basis of the observed realization of the process, x(t), 
the statistician wishes to decide whether {X(t)} is {Xi(t)} or {|X.2(t)}. This problem has 
been considered by Dvoretzky, Kiefer, and Wolfowitz (Ann. Math. Stat., Vol. 24 (1953), 
pp. 254-264) when {X(t)} is a stochastic process with stationary independent increments. 





218 ABSTRACTS 


In this paper we consider the case where {X(t)} is a Markov process, with z(t) a sufficient 
statistic for the process. We consider in particular the application of these results to some 
branching stochastic processes, e.g., the birth, death, birth-and-death, and Pélya processes. 
Let p(x, t; w) = Pr (X(t) = | w), x =0,1,2,--- ;we€Q, and denote by D(t) the decision 
function log {p(z, t; w:)/p(x, t; w:)}. For decision boundaries A and B, B < 0 < A, the 
Wald sequential procedure is used to test the hypothesis H;(i = 1, 2) that w = w; , where 
w; and w. are any two positive numbers, w: ¥ w: . Let f(d; w) denote the probability that 
the decision procedure will terminate with the acceptance of H2 when the parameter is 
really w and D(0) = d; and let m(s, r) = E{exp (sr)} be the moment generating function 
of the observation time 7 necessary to reach a decision when D(Q) = d and the parameter 
is really w. The usual probabilistic reasoning leads to functional equations for f(d; w) 
and m(s, r), the analytic properties of which will be discussed in a subsequent publication. 
(This work was supported by the USAF School of Aviation Medicine.) 


42. Note on a Markov Chain with Matrix States and Some Applications, A. T. 
BuHArucHA-Reip and Ropassé P. BHarucua-Rerw, University of Cali- 
fornia, Berkeley, (By Title). 


In connection with a probability problem in learning theory concerned with latent and 
reinforced types, it was necessary to consider a Markov chain with matrix states. Various 
ways of defining the transition probabilities are considered, and the asymptotic properties 
of the chain investigated. The results obtained are applicable to the study of changes in 
systems whose structure has a matrix representation, e.g., communication nets, social 
groups, etc. 


43. On the Comparison of Two Stochastic Epidemics, A. T. BHARucHA-ReIp, 
University of California, Berkeley, (By Title). 


In this paper the Girshick procedure for comparing or ranking two populations with 
respect to an unknown parameter is applied to the problem of comparing the effect of two 
types of housing on hospital admission rates for acute respiratory disease. The procedure 
is applied when different stochastic models are used to describe the development of the 
epidemic. Data used are from an epidemic situation studied at Sampson Air Force Base, 
Geneva, New York. (This work was supported by the USAF School of Aviation Medicine 


44. A Sequential Multiple Decision Procedure for Selecting the Population 
with the Largest Mean from k Normal Populations with a Common 
Unknown Variance, (Preliminary Report), R. E. Becunorer, Cornell 
University, and M. Soset, Bell Telephone Laboratories, (By Title). 


Let 2:;(¢ = 1, --- ,k; 7 = 1,2, ---) be independent observations from normal popula- 
tions II; with unknowns means y; and a common unknown variance, and let yw; S --- S wey 
denote the ranked means. A sequential procedure is proposed which guarantees probability 
P* of selecting the population with the largest mean wy; whenever pry — wes) 2 5*; the 
constants P* < 1 and 6* > 0 are preassigned. Let 


s ym J we wn / = 2/bk 
Zim = Ljoi X%;/m, Sm = Lint Ljai (Liz — Zim)?/k(m — 1), 


and lijm = (Zim — Fim — 5*) /8m+/2/m. For i = 1, --: , k let 


a~1 s=1 


a api 


kok 
Lin = : + D J Aastiam tigm/k(m — 1) 





ABSTRACTS 219 


where Ags = 2(k — 1)/k for a = 6 and —2/k for a # 8, let Lijm S --+ S Lim denote 
the ranked Lin , and let P, = L nm / Dent Lim . At every stage, the k values, Li» differ 
with probability one and are in one-to-one correspondence with the k populations I]; ; 
let I] {xjm denote the population associated with Li.jm at the mth stage. Procedure: At the 
mth stage (m = 1, 2, ---) take the vector observation (Zim, --- , Zim) and compute P,, . 
If P,, 2 P*, stop and select Ij, ; if Pn < P*, take the (m + 1)st vector observation and 
compute P,,,,. This procedure meets the requirement, is scale and location invariant, 
and the probability of termination is unity. The procedure can be generalized to handle 
problems such as obtaining a complete ranking of the k means. (Research supported in 
part by the U. 8. Air Force through the Office of Scientific Research of the ARDC.) 


45. A Scale Invariant Sequential Multiple Decision Procedure for Selecting 
the Population with the Smallest Variance from k Normal Populations, 
(Preliminary Report), R. E. Brecuuorer, Cornell University, and 
M. Soset, Bell Telephone Laboratories, (By Title). 


Let 2;(¢ = 1, --- ,k; 7 = 1, 2, ---) be independent observations from normal popula 
tions II; with unknown means yu; and unknown variances o: , and let on} S:-+--s ork) de- 
note the ranked variances. A sequential procedure is proposed which guarantees probability 
P* of selecting the populations with the smallest variance of; whenever of2)/o1; 2 6°; 
the constants P* < 1 and 6* > 1 are preassigned. Let Zim = Zui 2i;/m, 


2 
Sim 
. - 2 2 . : 
Lin (Xi; = Zim), and Rjin = 8im/Sim - For i = ay a k let 


Lim = (15s Rim)" [1 + ) Ryn / O°} 3, 


let Lujm S --+ S Liejm denote the ranked Lin , and let P,, = Lijm/ Zea Lim . At every 
stage, the k values, Li», , differ with probability one and are in one-to-one correspondence 
with the k populations I; ; let Tj, denote the population associated with Lixj, at the 
mth stage. Procedure: At the mth stage (m = 2, 3, ---), take the vector observation 
(tim, *** » Lem) and compute P,, . If P,, 2 P*, stop and select Tjm ; if Px < P* take 
the (m + 1)st vector observation and compute P,,,, . This procedure meets the require- 
ment, is scale and location invariant, and the probability of termination is unity. The 
procedure can be generalized to handle problems such as obtaining a complete ranking 
of the k variances. A similar procedure can be used for the case of known means and for 
ranking the scale parameters of exponential populations. (Research supported in part by 
the U.S. Air Force through the Office of Scientific Research of the ARDC.) 


46. Exact Probabilities in a Test for Markoff Dependency, Reep B. Dawson, 
Jr., Department of Defense. 


This paper is concerned with Markoff dependency (of the first order) in a digital stream, 
where the object is to test the hypothesis of independence against any alternative which 
alters the probabilities of the pairs. Let N digits be distributed about an oriented circle so 
that each of the (N — 1)! arrays are equally likely, and form a matrix [f;;], where f;; is the 
number of digits i which are followed by a digit 7. The exact probability of this matrix of 
pairs is found, generalizing a result of Stevens (Ann. Eugenics, Vol. 9 (1939), pp. 10-17). 
This probability, asymptotically the same as the probability that a matrix with the same 
entries will arise from the usual contingency table assumptions, illuminates a special case 
of the asymptotic test of Hoel (Biometrika, Vol. 41 (1954), pp. 430-433) for Markoff de- 
pendency of general order. A formula for the expectancy of a product of factorial powers 
of the fi; is derived. 





220 ABSTRACTS 


47. A Combinatorial Problem and Its Application to Probability Theory, T. V. 
NaARAYANA, McGill University. 


A quasi order called k-domination is defined on the r-partitions of two integers m and n. 
An explicit expression for the number of k-dominations of the r-partitions of n by those of 
m is derived. This result is extended and shown to be a generalization of the ‘‘probléme 
du scrutin”’ of D. André. Two classes of coin-tossing problems are solved as an application 
of this result. A number of combinatorial identities and the solution of a class of difference 
equations are obtained by probability methods. The relation of this problem to the recur- 
rent events of Feller in the case of coin tossing is briefly discussed. 


48. The Bayesian Inference Problem in Stochastic Systems, Max A. Woop- 
BuRY, George Washington University. 


In an experimental or environmental stochastic system, the possible inputs to the 
controlled stochastic process are represented by stochastic mappings of the internal states 
of the system into each other. The observable outputs are assumed to be the result of a 
stochastic mapping from the internal states of the system to the set of possible outputs. 
In the case where the inputs only are known, the general formula for the a posteriori dis- 
tribution at a given time is the result of applying the product of the input stochastic matrices 
to the a priori distribution vector. If, however, account is taken of the information pro- 
vided by the output the result is expressible in linear terms only if the requirement for a 
normalized probability vector is dropped. The relationship of this result to the stochastic 
behavior models of Rosenblatt, Flood and Mosteller is discussed. (The research covered 
by this abstract was supported by the Office of Naval Research.) 


49. Some Nonparametric Generalizations of Multivariate Analysis and Analy- 


sis of Variance, S. N. Roy, University of North Carolina. 


With observed frequency data arranged in a multi-way table, assuming that the observa- 
tions are independent in probability, there will be, under any hypothesis, (i) a single multi- 
nomial distribution or (ii) the product of a number of separate multinomial distributions 
according as (i) only the total number of observations is supposed to be fixed from sample 
to sample or (ii) marginal frequencies in certain directions of the table are supposed to be 
fixed. An attempt is made at a systematic elaboration of the historically prior ideas of 
Barnard and Pearson (Biometrika, 1947), to (i) multivariate analysis, starting from a 
single multinomial and framing hypothesis suitable to multivariate analysis situations 
and to (ii) analysis of variance, starting from the product of an appropriate number of 
multinomials and framing hypotheses suitable to analysis of variance situations. The 
theorems used are those of Cramér [Mathematical Methods of Statistics, Chapter 30] and 
some other theorems which can be proved on the same lines. The conditional probability 
approach is altogether abandoned. 


50. Further Remarks on Measures of Association for Cross-Classifications, 
Leo A. GoopMAN, University of Chicago, and Writ1am H. KruskKAt, 
Universities of California and Chicago. 


Measures of association discussed by the authors previously (J. Amer. Stat. Assn., 49 
(1954), 732-64) are considered further, especially in regard to the sampling distributions 
of their sample analogues. Asymptotic distributions are obtained for a number of cases, 





ABSTRACTS 221 


and numerical investigations of the accuracy (qua approximations) of these asymptotic 
distributions are described. 


51. Uniformly Consistent Sequences of Multiple-Decision Rules, WILLIAM 
JACKSON HALL, University of North Carolina. 


Suppose z has an unknown distribution function F, belonging to one of m disjoint classes 
wi, *** , @m, and suppose A, --- , A» are corresponding alternative decisions, one of 
which is to be chosen by a multiple-decision rule (m-d.r.) D, after taking a sample of size n. 
D, is defined by *(z) = [gr(x), --- , om(x)], 7 = 0, x denoting the sample, where the 67's 
sum over i to unity. ¢; (x) is the probability that D, chooses A; when z is observed. Defini- 
tion 1: |\o"(x)},n = 1,2, --- , defines a ‘uniformly consistent sequence (u.c.s.) of m-d.r.’s 

D,,} for discriminating among w, --- , w»’’ if lim,.. inf ree, Srei(X) = 1(i = 1,--+,m). 
Definition 2: @"(x) defines a ‘‘non-trivial m-d.r. D, for discriminating among a , --- , @m’’ 
if D7. infres, Erdi(X) > 1. Theorem 1: A necessary and sufficient condition for the existence 
of a u.c.s. of m-d.r.’s for discriminating among w , «++ , Wm is that there exist non-trivial 
2-d.r.’s for discriminating between w; and w; for some n;; (sample size) for every i ~ j. Results 
of Hoeffding (unpublished) and Berger and Wald (Ann. Math. Stat., Vol. 20 (1949), pp. 104-9) 
are adapted to supply some necessary and sufficient conditions, respectively, for the ex- 
istence of non-trivial 2-d.r.’s. Theorem 2: A necessary and sufficient condition for the exist- 
ence of a most economical m-d.r. relative to any (a: , «*+ , am), or relative to any (8;;), for 
discriminating among w , --- , wm (Hall, Abstract, Ann. Math. Stat., Vol. 25 (1954), p. 814) 
is that there exist a u.c.s. of m-d.r.’s for discriminating among w , «++ ,@m. 


52. Some Hypergeometric Series Distributions Occurring in Birth-and-Death 
Processes at Equilibrium, (Preliminary Report), Witu1am Jackson HALL, 
University of North Carolina, (By Title). 


Some time-homogeneous birth-and-death processes at equilibrium are considered in 
which the birth and death rates are ‘“‘stimulated’’ by ‘‘overcrowding.”’ Generally, under 
mild restrictions, p, , the distribution of population size n is proportional to A,/(M,n)), 
where An = AoA ++ An—-1 and M, = wyse ++ un 3 An(un) is the birth (death) rate when the 
population size is n. If \, is quadratic in n (i.e., constant immigration rate and reproduc- 
tion rate is linear in n) andy, linear in n, then p, is shown to be proportional to the (n + 1)th 
term in a general hypergeometric series, a four parameter distribution. If \, and w, are both 
linear (constant reproduction rate), p, is proportional to the (nm + 1)th term in a confluent 
hypergeometric series, a three parameter distribution. In the same manner, using a con- 
stant death rate, p, is proportional to the (n + 1)th term in a negative binomial series, 
as is well known; and, with no reproduction, an exponential series (Poisson distribution) 
is obtained. Each distribution is a limiting form of the preceding one. Generating functions, 
moments, and approximate estimates by the method of moments of the parameters of the 
hypergeometric series distributions are derived. 


53. Some General Aspects of Stochastic Approximations, Tosto KiraGcawa, 
Iowa State College. 


As one continuation of random integration introduced by the author, some general 
aspects of stochastic approximations will be discussed specifically in reference to the risk 
function formulations. Our stochastic approximations are concerned with the various 
problems of (a) solutions of equations, (b) interpolation problems, (c) mapping problems, 
and (d) numerical differentiations. 





222 ABSTRACTS 


54. The Analysis of Incidence Rates Under Multiple Classifications of the 
Population, (Preliminary Report), Wyman RicHarpson, University of 
North Carolina. 


A population is classified two ways into cells, n;; . The number of cases, a;; (of some 
disease, for instance), is assumed to have, in one model, a Poisson distribution with pa- 
rameter n;jpi; , and in another, a binomial (Q;; , n;;) distribution. Q;; is assumed to be equal 
to f(@; ,w;). The hypothesis y, = --- = y, can be tested in each model by x?, with expected 
frequencies in each cell of N;;A;-/Ni- , (where Aj. = 2; Ai; , etc.). Maximum likelihood 
equations are derived for the case f(0; ,~;) = 0; . It is shown that, except for a multi- 
plicative factor, there is a single solution of these equations, which can be obtained by 
efficient iterative procedures. This result holds when there are k classifications. In the 
Poisson model, these estimates are sufficient. A large sample test of the hypothesis y, = 
--- = y, against the alternative Q;; = 0; is to compute x? = 2[D A;. log. (6:N;./A;-) + 
> A; log. ¥;]. 


55. Estimation of Percentiles by Order Statistics, A. E. SarHan, University of 
North Carolina. 


In previous work of the author (‘‘Estimation of the mean and standard deviation by 
order statistics,’’ Ann. Math. Stat., Vol. 25 (1954), and Part II, Ann. Math. Stat., Vol. 26 
(1955)) the means and standard deviations of certain distributions were estimated by the 
best linear combinations of the ordered sample values. In the present paper, the same 
methods are used to derive a general expression for estimation of the jth percentile and its 
variance. From this expression and by making use of previous results, the jth percentile is 
estimated for certain distributions. As special cases, the estimates of the 50th percentile 
(the population median) and of the semi-interquartile range are calculated. 


56. On Renewal Theory, Counter Problems, and Quasi-Poisson Processes; 
Wa.rTer L. Smita, University of North Carolina. 


Let {t;} be a renewal process, i.e., a sequence of non-negative, independent, identically 
distributed random variables which are not zero with probability one. Let u, = Et], and 
define n; by Df't; S t < Di'*? t; (taking n, = Oif t: > t). If H(t) = En, , then it is shown 
that (i) if ui < «, then a necessary and sufficient condition fory: < «© isthatlim,., {H(t) — 
iui} = 6 exist and be finite, when pw: = Qui(1 + 8); (ii) if t; = uy + vy where {u;} and {v,} 
are independent renewal processes, the v; having a negative exponential distribution, then 
lim;:.,.H’(t) always exists. The results (i) and (ii) render the calculation of the asymptotic 
properties of a certain electronic counter process, previously studied by Hammersley, 
straightforward. If H(t) is linear in ¢ for all t > +r, for some finite 7, the process is called 
quasi-Poisson, and the class of quasi-Poisson processes is not empty. Let Y(z; t) = 
Pr (2ft*! t; S$ t+ x). Then it is shown that a necessary and sufficient condition for Y (z; t) 
to be independent of t > 7 is that {t;} be quasi-Poisson. When {t;} is quasi-Poisson, u, < « 
for all r, and the study of the effects of an automatic self-paralyzing mechanism on {t;}, of 
a type in use for blood-cell counting, becomes trivial. 


57. On the Construction of Significance Tests on the Circle and the Sphere, 
G.S. Watson, The Australian National University, and E. J. Wrii1aMs, 


Commonwealth Scientific and Ind. Res. Organization, 8. Melbourne, 
(By Title). 


The probability density proportional to exp (k cos @), where k is a precision constant and 
6 is the angle between an observed vector and a population mean vector or polar vector, has 





ABSTRACTS 223 


been considered in two and three dimensions by several authors. Significance tests are 
required to test (i) that k = k0, is a prescribed value, or that several populations have the 
same value of k, and (ii) that the polar direction of a population has prescribed direction 
cosines or that several populations have the same polar vectors. Tests of these hypotheses 
are given which are free of nuisance parameters. They are based on conditional distribu- 
tions formed by holding constant sufficient statistics. Inequalities and approximations are 
suggested to make the tests easy to apply in practice. The arithmetic examples given sug- 
gest that, in three dimensions, the tests given by one of us (G.S.W.) elsewhere will be 
satisfactory. 


58. Estimation of Individual Variations in an Unreplicated Two-Way Classifica- 
tion, (Preliminary Report), Tuomas 8. Russet and RautpH ALLAN Brap- 
LEY, Virginia Polytechnic Institute, (By Title). 


Consider a two-way classification, the usual model 2;; = w+ 7; + 8; + ej ,i = 1,-+- , 1, 
j=1,---,r (e.g., r chemists and n batches or r judges and n items) and the usual assump- 
tions, except let var («;) = oj . It was assumed that an estimator Q; of o} should be a 
quadratic form in the (r — 1)(n — 1) linear contrasts usually ascribed to error. Reasonable 
requirements on such a quadratic form led to the estimator in (mn — 1)(r — 1)(r — 2)Q; = 
r(r — 1)E; — E, where Ej = Diu (zi; — zi. — 2.5 + 2..)? and E = Yj, E;, the usual 
error sum of squares. Q; is the estimator previously suggested by Ehrenberg (Biometrika, 
Vol. 37 (1950), pp. 347-357). Q; has been shown to be the maximum likelihood estimator 
of ¢} only when r = 3. When oj = o? for all j, the distribution of Q;/e? has been shown to 
be that of rx2_1/(n — 1)(r — 1) — xdtyr-2)/(n — 1)(r — 1)(r — 2), the two x?’s being in- 
dependent. Q;/E has been shown to be a monotone function of an F with (n — 1)(r — 2) 
and (n — 1) degrees of freedom formed from those x?’s. The joint distribution of the Q’s 
has been considered and further research on various aspects of the problem is underway. 
(Work supported by A.R.S., U.S.D.A. and Q.M.R. and D., U.S. Army.) 


59. Empirical Bayes Estimation, (Preliminary Report), M. V. Jouns, Jr., 
Columbia University. 


Let X = (Xi ,X:,--- ,X,) where the X,’s are independent discrete valued random vari- 
ables with a common c.d.f. F(z | \), and where there exists an a priori probability measure 
p over ac-algebra of subsets of the values of the parameter A, so that the parameter is also 
a random variable A. Suppose that it is desired to estimate 6(A) = E(X; | A = \), using the 
risk function E(y(X) — @(A))*, where ¢(x) is any estimator. Let the Bayes estimator (de- 
pending on p and F(z | d)) be ¢*(x) = E(@(A) | X = x). Suppose now that p and the form 
of F are unknown, but that n independent (r + 1)-component random vectors X; , X: , 

- , X, , each having the same probability structure as X, are available. Then a ‘‘non- 
parametric’’ estimator ¢,(x) = ¢n(x; Xi, X2, --- , Xn) is given having the property that 
lim,.« E(gn(K) — @(A))? = E(e*(X) — 6(A))? for any p and F subject to certain mild re- 
strictions. The general case where the X,’s are not necessarily discrete valued is also con- 
sidered. Similar results are obtained for several cases (considered by Robbins in the Third 
Berkeley Symposium on Mathematical Statistics and Probability) where p is unknown but 
F belongs to a specified one-parameter family of probability distributions and where the 
value of the parameter is to be estimated. The behavior of these empirical Bayes estimators 
is also investigated for finite n for certain special cases. 





NEWS AND NOTICES 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


T. A. Bancroft, director of the Statistical Laboratory, lowa State College, has 
been in Mexico from midAugust to early October on behalf of the Food and 
Agriculture Organization, UN. His assignment was to advise the Government of 
Mexico and FAO on the present status of the use of experimental designs and 
survey techniques in Mexico and to help prepare for the 1956 FAO-sponsored 
training center in Mexico. 

R. E. Beckwith has left the Operations Research Group, Case Institute of 
Technology, to join the staff of the Statistical Laboratory, Purdue University, 
Lafayette, Indiana. 

Dr. Charles B. Bell, Jr., formerly research engineer at Douglas Aircraft 
Company, has been appointed Assistant Professor of Physics at Xavier Uni- 
versity, New Orleans, Louisiana. 

Dr. Ernest P. Billeter has left the Statistical Office of the City of Zurich to be- 
come a Scientific Assistant to the Bank for International Settlements (BIS) 
in Basel. 

Benjamin Buchbinder is now a statistician in the Commissioned Corps of the 
U.S. Public Health Service, assigned to the Division of Nursing Resources of 
the Department of Health, Education and Welfare in Washington, D. C. 

Donald L. Burkholder, who received his Ph.D. in mathematical statistics at 
the University of North Carolina in 1955, is now an Assistant Professor in the 
Department of Mathematics of the University of Illinois. 

Theodore Colton is now a statistician in the Commissioned Corps of the U. 8. 
Public Health Service in the Program Evaluation and Reports Branch of the 
Division of Hospital and Medical Facilities. 

John H. Cover has joined the HRAF South Asia Project at the University of 
California until June 30, 1956, at which time he will return to his position with 
the Bureau of Business and Economic Research at the University of Maryland. 

Don A. Davis of Seattle, Washington, is temporarily located at 145-C Baker 
Village, Columbus, Georgia, where he is with the Army at Fort Benning. 

Claude de Courval is now with the Institut de Microbiologie de 1’U. de Mon- 
treal, in Montreal, Que., Canada. 

Professor Robert Dorfman has been appointed as Associate Professor in the 
Department of Economics, Harvard University. 

Benjamin Epstein has been promoted to a full professorship at Wayne Uni- 
versity. During the 1955-56 academic year he is on Sabbatical leave at Stanford 
University. 

W. T. Federer returned to Cornell University in September, 1955, after one 
year, from Honolulu, Hawaii, where he was head of the Department of Experi- 
mental Statistics, Experiment Station, Hawaiian Sugar Planters’ Association 
and a consultant in Statistics for the Pineapple Research Institute. 





NEWS AND NOTICES 225 


Dr. Rudolf J. Freund, who received his Ph.D. degree in statistics at North 
Carolina, has joined the staff of the Department of Statistics as Associate 
Professor at the Virginia Polytechnic Institute. 

Research Professor Bernard Friedman of New York University is on leave of 
absence and has been appointed to a visiting professorship at the University of 
California, Berkeley. 

After receiving a Ph.D. in Statistics from the University of North Carolina 
June 1955, Seymour Geisser was at the Statistical Engineering Laboratory, 
N.B.S. Washington, D. C., before accepting a commission with the U. 8. Public 
Health Service four months later. He is now with the N.I.H. at Bethesda, 
Maryland. 

Leo A. Goodman, formerly Associate Professor, has been promoted to Pro- 
fessor of Statistics and Sociology at the University of Chicago. 

Earl L. Green has returned to his position as professor of genetics at Ohio 
State University after being on leave of absence for two years while serving as 
Geneticist, Division of Biology and Medicine, U. S. Atomic Energy Com- 
mission. 

Bernard Harris is on leave from his position as mathematician, Defense De- 
partment, and Instructor in Statistics, George Washington University, and will 
be at the Stanford University Applied Mathematics and Statistcs Laboratory 
for one year. 

Paul G. Homeyer has been in Mexico for most of the fall quarter of 1955 as 
agricultural statistician with the Food and Agriculture Organization, United 
Nations. He returns to the Statistical Laboratory, lowa State College, in Decem- 
ber. 

Robert H. Hoskins of the John Hancock Mutual Life Insurance Company 
recently received an appointment as Assistant Group Actuary. 

Dr. Stanley Isaacson has resigned his position as Assistant Professor of 
Statistics at Iowa State College in order to accept a position as a Statistician with 
the Semiconductor Department of Westinghouse Electric Corporation. 

Dr. Koichi Ito, formerly of Institute of Statistics, University of North Caro- 
lina, returned to Japan in August, 1955, and is now Lecturer in Statistics at 
Nanzan University, Nagoya, Japan. 

Ralph B. Johnson resigned from Clemson College and accepted a position as 
Assistant Professor of Mathematics at Catawba College, Salisbury, North 
Carolina. 

After receiving his masters degree from Iowa State College August 
1955, Lawrence F.. Jones accepted a position in the Statistical Control Section of 
the Semiconductor Department of the Westinghouse Electric Corporation. 

Dr. Tosio Kitagawa joined the Statistical Laboratory of Iowa State College 
as visiting professor for the fall quarter of 1955. He has been on leave from the 
position of professor of theory of probability and mathematical statistics at 
Kyusyu University, Fukuoka, Japan, since last summer. 

Dr. Morris Krauss has begun postdoctoral research at the National Bureau 





226 NEWS AND NOTICES 


of Standards. He is one of seven young graduate scientists to be selected for the 
Postdoctoral Research Associateship program sponsored by the National Acad- 
emy of Sciences-National Research Council and the Bureau. 

Boyd Ladd has been appointed research manager of Southwest Research 
Institute’s department of industrial economics. 

Ferdinand Lemus, formerly a graduate assistant at the Statistical Laboratory, 
Iowa State College, has accepted a position with the Experimental Design and 
Statistical Analysis Group of Westinghouse Electric Corporation, Pittsburgh, 
Pennsylvania. 

Craig A. Magwire has joined the staff of the Department of Mathematics and 
Mechanics at the U. S. Naval Postgraduate School, Monterey, California, as 
Associate Professor of Mathematics. 

Allen L. Mayerson has completed his year as a Fullbright scholar at the 
Sorbonne, and has returned to his position as Principal Actuary of the N. Y. 
State Insurance Dept. While abroad he was elected a member of the Swiss 
Actuarial Society and of the French Institute of Actuaries, and did some research 
into European actuarial practices, in addition to his mathematical studies at the 
Sorbonne. 

Franklin 8. McFeely has completed the work for the Ph.D. in Statistics at 
Virginia Polytechnic Institute and is now on the staff of the University of 
Colorado School of Medicine. 

Mrs. Mary G. Natrella has rejoined the Applied Mathematics Division of the 
National Bureau of Standards where she will serve on the staff of the Statistical 
Engineering Laboratory. 

Joseph A. Navarro received his Ph.D. in mathematical statistics from Purdue 
University in August 1955 and is at present employed at the General Electric 
Advanced Electronics Center in Ithaca, New York. 

John Neter, formerly of Syracuse University, has accepted an appointment as 
Associate Professor in the School of Business Administration, University of 
Minnesota, Minneapolis, Minnesota. 

Dr. Richard E. Nettleton has begun postdoctoral research at the National 
Bureau of Standards. He is one of seven young graduate scientists to be selected 
for the Postdoctoral Research Associateship program sponsored by the National 
Academy of Sciences-National Research Council and the Bureau. 

Bernard Ostle has been promoted to Professor of Mathematics and Statis- 
tics at Montana State College. 

Gilbert I. Paul, formerly a student at N. C. State College, Raleigh, N.:C. , is 
now teaching courses in statistics and in statistical genetics at McGill University, 
Montreal, Canada. 

Dr. Dayle D. Rippe has accepted a position as Operations Research Analyst in 
the General Office of General Mills, Inc., Minneapolis. During the three years 
prior to this he was an Operations Analyst with the Strategic Air Command, 
Omaha, Nebraska. 

Jagdish Sharon Rustagi has completed his work for Ph.D. degree in Statistics 





NEWS AND NOTICES 227 


at Stanford University, California, and has a teaching job in the Department of 
Mathematics, Carnegie Institute of Technology, Pittsburgh, Pennsylvania, for 
the academic year, 1955-56. 

Richard H. Shaw resigned from the U. S. Naval Ordnance Plant in Indi- 
anapolis, to take a position with General Dry Batteries, Inc., in Cleveland. 

Jack Silber of Roosevelt University spent the summer of 1955 as Consultant 
to the Operations Analysis Office at the U. 8. Air Force Missile Test Center, 
Patrick AFB, Florida. 

Upon completion of his Ph.D. in the Dept. of Mathematics at Purdue Uni- 
versity, John A. Tischendorf received an appointment with the Commissioned 
Corps of the U.S. Public Health Service. He is now stationed at Boston, Massa- 
chusetts, serving as statistician on a study conducted by the Chronic Disease 
Program. 

Donald R. Truax has been appointed Research Fellow in Mathematics at the 
California Institute of Technology after receiving his Ph.D. in statistics at 
Stanford University. 

Howard G. Tucker received his Ph.D. at the University of California in June, 
1955, and is now Assistant Professor of Mathematics at the University of 
Oregon. 

Mr. Robert 8. Walleigh has rejoined the staff of the National Bureau of Stand- 
ards as Assistant Director for Administration. In this position, he will serve as the 
Director’s principal staff advisor on management matters, and supervise the 
operation of the administrative divisions that support the Bureau’s technical 
program. 


Dr. R. Lowell Wine, who received his Ph.D. degree in statistics in June 1955 
from the Virginia Polytechnic Institute, has joined the staff there in the Depart- 
ment of Statistics as Associate Professor. 

Hans-Joachim Zindler has been appointed as consultant for Mathematical 
Statistics in the “Statistisches Bundesant’” Abteilung VIII, in Wiesbaden, 
Germany. 


a 


New Members 


The following persons have been elected to membership in the Institute 
August 6, 1955 to November 9, 1955 


Anderson, Allan G., Ph.D. (University of Michigan), Mathematician, Jones and Laughlin 
Steel Corporation, Research Center, 900 Agnew Road, Pittsburgh 30, Pennsylvania. 

Baldwin, Roger R., M. A. (Columbia University), Graduate Student, Princeton University, 
Mathematics Department, Fine Hall, Princeton, New Jersey, 282 W. 11th Street, New 
York 14, New York. 

Banerjee, D. P., D.Sc. (Dacca University) Department Head of Mathematics, Meerut 
College, Meerut U.P., India. 

Bessler, Stuart Alan, Bachelor of Industrial Engineering and Bachelor of Business Ad- 
ministration (University of Minnesota), Graduate student and instructor, Depart- 





228 NEWS AND NOTICES 


ment of Economics, School of Business Administration, University of Minnesota, 
Minneapolis 14, Minnesota, 3922 Basswood Road, Minneapolis 16, Minnesota. 

Bissinger, Barnard Hinkle, Ph.D. (Cornell University), Chairman Mathematics Dept., 
Lebanon Valley College, Annville, Pennsylvania. 

Brenna, Leroy S., M.S., (Kansas State College), Industrial Engineer, Eastman Kodak 
Company, 343 State Street, Rochester, 4, New York, Strathmore Circle, Bldg. 6, Apart- 
ment 3, Rochester 9, New York. 

Busch, Sister Mary Constance, C.S.A., B.S. in Ed. (Marian College), Math. Instructor, 
Marian College, 30 East Division Street, Fond du Lac, Wisconsin. 

Dempster, Arthur P., M.A. (Univ. of Toronto), Student, Princeton Univ., Dept. of Mathe- 
matics, Fine Hall, Box 708, Princeton, New Jersey. 

Drenick, Rudolf F., Ph.D. (Univ. of Vienna), Manager Analytical Group, Radio Corpora- 
tion of America, Bldg. 10-8, Camden 2, New Jersey. 

Ferris, George Emery, M.A. (Columbia Univ.), Graduate Assistant, Cornell University, 
Ithaca, New York, 114 Eddy Street, Ithaca, New York. 

Feyerherm, Arlin M., Ph.D. (lowa State College), Assistant Professor of Mathematics, 
Kansas State College, Manhattan, Kansas. 

Fiske, Edwards N., B.S. (Roanoke College), Graduate Student, Virginia Polytechnic 
Institute, Blacksburg, Virginia, Bor 3611, Virginia Tech. Station V, Blacksburg, Vir- 
ginia 

Gregoire, Jean (Mr.), B.Sc. (Laval University), Graduate Student and assistant, Uni- 
versity of Manitoba, Winnipeg, Manitoba, Canada, 389 6th Avenue, Grand Mere, Que. 
Canada. 

Hadley, Hershel N., B.A. (Whitman College), Head, Statistical Division, Inspection and 
Test Department, Naval Powder Factory, Indian Head, Maryland, 5706 Fortieth 
Avenue, Hyattsville, Maryland. 

Helms, Lester L., M.S. (Purdue Univ.), Mathematician, Operations Research Group, 
Convair, Pomona, California. 

Hermann, Philip, M.S. (Case Institute of Technology), Supervisor Applied Mathematics, 
Jones and Laughlin Steel Corporation, No. 3 Gateway Center, Pittsburgh, Pa. 

Kassler, Raymond, B.A. (Brooklyn College), Research Engineer, Boland and Boyce, Inc., 
Belleville 9, New Jersey, P.O. Box 441, Camden 1, New Jersey. 

Keats, Mortimer B., M.A. (George Washington University), Head, Statistics and Analysis 
Division, Quality Surveillance Department, U.S. Naval Powder Factory, Indian Head, 
Maryland, 2354 Skyland Place, S.E., Washington 20, D.C. 

Miller, Rupert Griel, Jr., A. B. (Princeton), Student, Stanford University, Statistical Lab., 
Sequoia Hall, Stanford, California. 

Ogawa, Junjiro, Ph.D. (Osaka University), Asst. Prof., Department of Mathematics, 
Faculty of Science, Osaka University, Japan, % The Dept. of Statistics, Univ. of North 
Carolina, Chapel Hill, N.C. 

Remmenga, Elmer E., Ph.D. (Purdue Univ.) Asst. Prof., Department of Mathematics, 
Colorado A. and M., Fort Collins, Colorado., 1821 Crestmore Place, Fort Collins, Colo- 
rado. 

Roberts, Howard R., B.Sc. (George Washington Univ.) Graduate Teaching Assistant, 
George Washington University, Department of Statistics, Washington 6, D.C., 2022 
G Street, N.W. Washington 6, D.C. 

Romani Miquel, José, Dipl. en Estadistica (Madrid Univ.), Colaborador, Instituto de In- 
vestigaciones Estadisticas, Serrano 123, Madrid, Spain, Harzenbusch 5, Madrid, Spain. 

Ryan, John M., Ph.D. (University of North Carolina), Mathematical Economist, United 
Gas Corporation, P. O. Box 1407, Shreveport, Louisiana. 

Sakaguchi, Minoru, (Tokyo Institute of Technology), Lecturer, Department of Mathe- 
matics, University of Electro-communication, Kojima-cho 14, Chohu, Tokyo, Japan. 

Schmid, Paul, Diplom der ETH (Eidg. Anstalt fur das forstliche Versuchswesen), Research 





NEWS AND NOTICES 229 


Assistant ETH, Dipl. Math. ETH, Mathematician, Eidg. Anstalt fur das forstliche 
Versuchswesen, Tannenstr. 11, Zurich 6, Switzerland. 

Siegel, Sidney, Ph.D. (Stanford Univ.) Asst. Prof., Department of Psychology, Pennsyl- 
vania State University, University Park, Pennsylvania. 

Shtulman, Sidney, BBA (College of City of New York), Analytical Statistician, The Theory 
and Analysis Division of the Computation and Ballistics Department Federal Civil 
Service, Dahlgren, Virginia, Naval Proving Ground, Dahlgren Va. 

Siller, Harry, Ph.D. (New York University), Asst. Professor, Department of Mathematics, 
Hofstra College, Hampstead, New York, 1139 Beach Channel Drive, Far Rockaway, New 
York. 

Snell, James Laurie, Ph.D. (Univ. of Illinois), Asst. Prof., Department of Mathematics, 
Dartmouth College, Hanover, N. H., 23 S. Park St., Hanover, N. H. 

Stenwick, Fern Caroline, Mathematician, David Taylor Model Basin, Carderock, Mary- 
land, 1039 Columbia Drive, Bucknell Manor, Alexandria, Virginia. 

Taylor, Robert J., M.S. (Virginia Polytechnic Institute), Mathematician, Naval Research 


Laboratory, Washington 25, D. C., 4761-B S. Capitol Terrace, S. W., Washington 24, 
D.C. 


Taomas, Raymond E., B. A. (George Washington University), Student, Graduate Assistant, 


George Washington University, Washington 6, D.C., 4249 Hildreth St., S. E., Washing- 
ton 19, D. C. 


Trinkl, Frank N., A. M. (University of Michigan), Student, Graduate Assistant, Stanford 
University, Stanford, California, 213-14 Stanford Village, Stanford, Calif 

Turner, Nura Dorothea, M.S. (State Univ. of Iowa), Asst. Prof., Department of Mathe- 
matics, New York State College for Teachers, Albany, New York. 

Wolinez, G., M.Sc. (Hebrew University), Research Worker, Government of Israel, Hahir- 
iah, Tel-Aviv, Israel, Dizzengoff 204, Tel-Aviv, Israel. 


cn 


Cooperative Graduate Summer Sessions in Statistics 


The University of Florida, North Carolina State College, Virginia Polytechnic 
Institute and the Southern Regional Education Board are jointly sponsoring a 
series of cooperative summer sessions in statistics. 

The third of these summer sessions will be held at North Carolina State 
College, June 11—July 20, 1956. A session is scheduled to be held at Virginia 
Polytechnic Institute in 1957 and at the University of Florida in 1958. Each 
summer session lasts six weeks and each course carries approximately three 
semester hours of graduate credit. 

The 1956 session will be held jointly with the Institute in Quantitative Re- 
search Methods in Agricultural Economics, sponsored by the Social Science 
Research Council. Several statistics courses will be oriented towards economic 
applications. 

The combined faculty for the 1956 summer session and Institute at North 
Carolina State College will include: Professor R. L. Anderson, North Carolina 
State College; Professor Gertrude M. Cox, North Carolina State College; Pro- 
fessor David B. Duncan, University of Florida; Professor Alva L. Finkner, 
North Carolina State College; Dr. Arnold H. E. Grandage, North Carolina State 
College; Professor Robert J. Hader, North Carolina State College; Assistant 
Professor Cleon Harrell, North Carolina State College; Professor Earl O. Heady, 





230 NEWS AND NOTICES 


Iowa State College; Professor Clifford G. Hildreth, Michigan State University; 
Professor Jack Levine, North Carolina State College; Professor Robert J. Mon- 
roe, North Carolina State College; and Assistant Professor Walter L. Smith, 
University of North Carolina. 

Courses to be offered this summer are: Statistical Methods I, Statistical Meth- 
ods II (Design of Experiments), Statistical Theory I (Probability and Parent 
Distribution), Statistical Theory II (Sampling Distributions and Inference), 
Sample Survey Designs, Advanced Analysis II, Advanced Calculus for Statistics, 
Stochastic Processes, Econometric Methods and Linear Programming. Lectures 
on Linear Equations (Matrix Algebra) and Production Functions will be given 
in the Institute program. 

Inquiries should be addressed to: 

Professor J. A. Rigney 

Department of Experimental Statistics 
North Carolina State College 

Raleigh, North Carolina 


—— a ————__ 


Summer Sessions at Berkeley, California 


The 1956 summer program in the Department of Statistics of the University 
of California, Berkeley, California, will consist of two sessions: June 18 to July 
28 and July 30 to September 8. The faculty of the summer sessions will include 
Professor D. R. Cox of the University of North Carolina, Professor Grace E. 
Bates of Mount Holyoke College, and Professor David Blackwell and Mr. T. 8. 
Ferguson of the Department of Statistics of the University of California. 

The program includes two of the usual undergraduate courses in each session, 
adapted primarily to meet the needs of students transferring from other centers 
who would like to undertake advanced study at the University of California 
during the regular academic year. Also a graduate seminar will be conducted by 
Professor Blackwell. This seminar will allow for individual consultation for stu- 
dents working toward higher degrees. 


rr 


Fellowship in Experimental Statistics 


The Department of Industrial and Engineering Administration of the Sibley 
School of Mechanical Engineering at Cornell University announces a graduate 
fellowship in the area of experimental statistics, sponsored by the Standard Oil 
Company of Ohio. This fellowship carries a stipend of $2,250 for the academic 
year and is to be used to support work toward either the M.S. or Ph.D. degree. 

Applicants are expected to have suitable undergraduate preparation in some 
branch of engineering or in the physical sciences, but undergraduate training in 
statistics is not a prerequisite. The holder of the fellowship will be expected to 





NEWS AND NOTICES 231 


take courses in both applied and mathematical statistics selected from among the 
offerings of the members of the Cornell Statistics Center 

Interested persons who desire additional information concerning this fellow- 
ship may write to Professor Robert E. Bechhofer at the above address. Applica- 
tion forms for this fellowship and for admission to the graduate school may be 
obtained from the Graduate School, 125 Edmund Ezra Day Hall, Cornell Uni- 
versity, Ithaca, New York, and should be filed not later than February 17, 1956, 
for April 1 award. Late applications will be considered only if an award has not 
been made by April 1. 


a 


REPORT OF THE NEW YORK MEETING OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


The sixty-eighth meeting of the Institute of Mathematical Statistics and the 
eighteenth annual meeting was held at the Hotel Biltmore, New York City, on 
December 27-30, 1955, in conjunction with the national annual meeting of the 
American Statistical Association and the Biometric Society (ENAR). A number 
of the sessions were joint with these two organizations. A Special Invited Paper 
entitled Stochastic Approximation was presented by Professor Cyrus Derman of 
Columbia University. The Rietz Lecture, entitled Probability in Statistics, was 
given by Professor William Feller of Princeton University. The following mem- 
bers of the Institute attended: 


Forman 8. Acton, Beatrice Aitchison, William Robert Allen, John E. Alman, 
Richard L. Anderson, Theodore W. Anderson, Fred C. Andrews, Harvey James 
Arnold, Herbert E. Arnold, Kenneth J. Arnold, Kenneth J. Arrow, Max As- 
trachan, R. R. Bahadur, J. C. Bain, Roger Raushenbush Baldwin, James B. 
Bartoo, Grace E. Bates, Geoffrey Beall, Helen P. Beard, Robert E. Bechhofer, 
Gordon H. Beckhart, Agnes Berger, Abraham J. Berman, A. T. Bharucha-Reid, 
Arnold M. Binder, Richard 8. Bingham, Allan Birnbaum, Z. William Birnbaum, 
David Blackwell, Archie Blake, Chester I. Bliss, Aaron Block, Julius R. Blum, 
Isadore Blumen, Robert M. Blumenthal, Paul Boschan, Raj. C. Bose, Albert H. 
Bowker, Ralph Allan Bradley, A. E. Brandt, Irwin D. J. Bross, Benjamin 
Buchbinder, Robert Wilbur Burgess, Paul J. Burke, Lyle D. Calvin, Mavis B. 
Carroll, Maria Castellani, Douglas G. Chapman, A. Charnes, Herman Chernoff, 
John T. Chu, Ira H. Cisin, Willard H. Clatworthy, Paul C. Clifford, William G. 
Cochran, Theodore Colton, Williarn Stokes Connor, Jerome Cornfield, Louis J. 
Cote, David R. Cox, Edwin Lory Cox, Gertrude M. Cox, Paul Charles Cox, 
Cecil C. Craig, Jean B. Crockett, Edwin Louis Crow, Lee Crump, Phelps P. 
Crump, Paz B. Culabutan, Joseph Francis Daly, Cuthbert Daniel, Reed B. 
Dawson, Jr., Besse B. Day, Ralph A. DeMarr, Francis R. Del Priore, Lucile 
Derrick, James L. Dolby, Thomas Donnelly, Robert Dorfman, Joseph Abraham 
Dresner, Acheson J. Duncan, David B. Duncan, Mary T. Dunleavy, Charles 





232 NEWS AND NOTICES 


William Dunnett, David Durand, Arthur M. Dutton, Meyer Dwass, A. Ross 
Eckler, George L. Edgett, Sylvain Ehrenfeld, Harvey Eisenberg, Harry Eisen- 
press, Henry Eliner, Daniel R. Embody, Benjamin Epstein, Robert Marvin 
Exselsen, Walter T. Federer, William Feller, Thomas 8. Ferguson, Robert 
Ferber, George Emery Ferris, John W. Fertig, Edwards Ned Fiske, John C. 
Flanagan, Lester R. Frankel, David Frazier, Spencer M. Free, Jr., Harold A. 
Freeman, John E. Freund, Henry D. Friedman, Milton Friedman, Fred Frish- 
man, L. A. Gardner, Jr., Norman Robert Garner, Murray A. Geisler, Seymour 
Geisser, J. Lincoln Gerende, Dorothy M. Gilford, Leon Gilford, Ruth L. Gold, 
Leo A. Goodman, Nathaniel R. Goodman, Mina Haskind Gourary, Franklin A. 
Graybill, Bernard G. Greenberg, Samuel W. Greenhouse, Joseph A. Greenwood, 
Geoffrey Gregory, Thomas N. E. Greville, Harold Gulliksen, Emil J. Gumbel, 
Lee Gunlogson, Paul Gunther, Margaret Gurney, Robert John Hader, John 8. 
Hagan, Keet W. Halbert, William Jackson Hall, Max Halperin, E. Cuyler 
Hammond, James F. Hannan, Morris H. Hansen, Gordon M. Harrington, 
Theodore E. Harris, Boyd Harshbarger, Herman O. Hartley, David G. Hays, 
William Carleton Healy, Jr., Ernest Earl Heinbach, Paul Heit, Leon H. Herbach, 
G. Ronald Herd, Philip Hermann, Irene Hess, Werner Hochwald, William 
Hodgkinson, Jr., Wassily Hoeffding, Paul G. Homeyer, William C. Hood, 
Robert Hooke, John W: Hopkins, Daniel G. Horvitz, Harold Hotelling, Earl E. 
Houseman, William Gerow Howe, J. Stuart Hunter, Frederick Vincent Hurst, 
Jr., Paul E. Irick, Nathan Jaspen, Raymond J. Jessen, Milton Vernon Johns, Jr. 
Howard L. Jones, Hyman B. Kaitz, Edward L. Kaplan, A. E. Karp, Marvin 
Aaron Kastenbaum, Leo Katz, Lester S. Kellogg, Oscar Kempthorne, Robert 
W. Kennard, George H. Kennedy, Jack C. Kiefer, Allyn W. Kimball, Bradford 
F. Kimball, Leslie Kish, Tosio Kitagawa, Tjalling C. Koopmans, Richard L. 
Kozelka, Charles H. Kraft, Evelyn Lucille Kramer, William Henry Kruskal, 
Roy R. Kuebler, Jr., Thomas E. Kurtz, Jack Laderman, Helen Humes Lamale 
(Mrs.), Andre G. Laurent, Fred Charles Leone, Howard Levene, Edward Abra- 
ham Lew, Everett Vernon Lewis, Gerald J. Lieberman, Gilbert Lieberman, 
Jacob E. Lieberman, Julius Lieblein, Rensis Likert, Richard F. Link, Benjamin 
Lipstein, Sebastian B. Littauer, Eugene Lukacs, Robert James Lundegard, 
George F. Lunger, Philip J. McCarthy, Brockway McMillan, Robert G. Me- 
Millan, Gertrude A. McQuaid, Ralph L. Madison, Benjamin Malzberg, John 
Mandel, Joseph Mandelson, Herbert Marshall, Frank J. Massey, Paul Meier, 
W. J. Merrill, Herbert A. Meyer, Paul L. Meyer, Paul D. Minton, Irwin Miller, 
Robert Mirsky, Azizali Farrukh Mohammed, Edward C. Milina, A. M. Mood, 
Milton Morrison, Norman Morse, Joseph Edward Morton, Lincoln E. Moses, 
Jack Moshman, Frederick Mosteller, Shu-Teh Chen Moy, Merv E. Muller, 
tay B. Murphy, Luis F. Nanni, Tadepalli Vankata Narayana, Mary G. Natrella, 
Joseph Anthony Navarro, M. R. Neifeld, A. Carl Nelson, Jr., Morton J. Netzorg, 
Robert Joseph Nichol, George E. Nicholson, Jr., Wesley L. Nicholson, Nicholas 
Nikitich, Harold Nisselson, Lionel MacLean Noel, Gottfried E. Noether, Monroe 
L. Norden, Nilan Norris, Horace W. Norton, James Norton, Jr., Junjiro Ogawa, 





NEWS AND NOTICES 233 


Carl Reading Ohman, Edwin G. Olds, Ingram Olkin, Paul S. Olmstead, A. L. 
O’Toole, Donald B. Owen, William R. Pabst, Jr., Emanuel Parzen, Robert E, 
Patton, Edward Burton Perrin, John K. Perrin, Eusche W. Pike, John W. Pratt, 
Lila Knudsen Randolph, Bayard Rankin, Stanley Reiter, George J. Resnikoff, 
Joseph S. Rhodes, Wyman Richardson, Donald L. Richter, Paul R. Rider, 
William L. Roach, Jr., Herbert E. Robbins, Helen Murray Roberts, Selby 
Lemley Robinson, Douglas 8. Robson, Robert Roeloffs, Albert C. Rohloff, 
Charles F. Roos, John H. Roseboom, David Rosenblatt, Harry M. Rosenblatt, 
Joan Raup Rosenblatt, Judah Isser Rosenblatt, Murray Rosenblatt, Willard C. 
Ross, 8. N. Roy, Jagdish 8. Rustagi, John Morris Ryan, David Sachs, Jerome 
Sacks, Daniel E. Sands, F. E. Saiterthwaite, L. J. Savage, Edward Sax, Henry 
Scheffé, Marvin A. Schneiderman, Elizabeth L. Scott, Hilary Seal, Oliver Abbott 
Shaw, Richard H. Shaw, Sidney Shtulman, Elizabeth Anna Shuhany, Sidney 
Siegel, Walter R. Simmons, Monroe Gilbert Sirken, Rosedith Sitgreaves, Morris 
Skibinsky, Hugh Fairfield Smith, John H. Smith, Walter Laws Smith, Jean F. 
Smolak, James L. Snell, Milton Sobel, Herbert Solomon, Paul N. Somerville, 
Frederick A. Sorensen, D. E. South, Mortimer Spiegelman, B. Ralph Stauber, 
George Powell Steck, Robert G. D. Steel, Arthur Stein, Frederick F. Stephan, 
Leo Eugene Storm, Fred W. Strodtbeck, Hale C. Sweeny, Zenon Szatrowski, 
Francis B. Taylor, Henry Teicher, Dan Teichroew, Milton E. Terry, Gerhard 
Tintner, Hebert C. 8. Thom, Donovan J. Thompson, William Rae Thompson, 
Leo J. Tick, Mary Newton Torrey, Chia Kwei Tsao, Albert William Tucker, 
John W. Tukey, Charles R. M. Tuttle, M. C. K. Tweedie, Jose Vergara, D. F. 


Votaw, Jr., Helen M. Walker, David L. Wallace, W. Allen Wallis, Sidney 
Weiner, Harry Weingarten, Eleanor 8. Weiss, Irving Weiss, Lionel Weiss, Oscar 
Wesler, Phillips Whidden, John 8. White, Alfred G. Whitney, Frank Wilcoxon, 
Martin B. Wilk, R. Lowell Wine, B. J. Winer, Gerald Winston, William Wolman, 
Max A. Woodbury, Charles Ashley Wright, John William Youden, Samuel Zahl, 
Royal Keith Zeigler, Marvin Zelen. 


The program follows: 


TUESDAY, DECEMBER 27, 1955 
9:00 a.m. Contributed Papers I 
Chairman: RosepitH SitGreaves, Teachers College, Columbia University 


Papers: 1. The Midrange of a Sample as an Estimator of the Population Midrange, 
Pau R. Riper, Wright-Patterson Air Force Base. 

2. Distribution of the Product of Maximum Values in Samples from a Rectangu- 
lar Population, Paut R. Riper, Wright-Patterson Air Force Base, (By 
title). 

3. A Note on Non-recurrent Random Walks, Cyrus Derman, Columbia Uni- 
versity, (By title). 

. Statistical Spectral Analysis, I: Consistent Asymptotically Normal Estimates 
of the Covariance Function and Spectral Averages, EMANUEL PARZEN, 
Columbia University, (By title). 

5. Statistical Spectral Analysis, II: Asymptotic Mean Square Error of a Class 





NEWS AND NOTICES 


of Estimates of the Spectral Density, EMANUEL PaRzEN, Columbia Univer- 
sity. 

. A Central Limit Theorem for Multilinear Stochastic Processes, EMANUEL 
ParzEN, Columbia University, (By title). 

. An Extension of Cramer’s Theorem 20.6 to Random Functions with Values 
in a Metric Space, EMANUEL PARZEN, Columbia University, (By title). 

. Orthogonality and Fractional Replication of Factorial Experiments, ALLAN 
BrrnBauMm, Columbia University. 

. On the Second Sample Size Function of a Bayes Two-Stage Test for the Mean, 
Morris SKiBINSKY, Purdue University. 

A New Estimation Procedure for a Linear Combination of Exponentials 
(Preliminary Report), Richarp G. CorNELL, Oak Ridge National Labo- 
ratory and Virginia Polytechnic Institute. 

. A Note on Weighted Randomization, D. R. Cox, University of North Caro- 
lina. 

. On the Analysis of Incomplete Block Designs, Marvin ZELEN, National Bu- 
reau of Standards. 

. A Remark on Wald’s Paper, ‘‘On a Statistical Problem Arising in the Classifi- 
cation of an Individual into One of Two Groups,’”’ JuNs1ro OGawa, Univer- 
sity of North Carolina. 

14. Consistency and Optimum Properties of Some Two-Sample Tests, JuLius 
R. Biv, Indiana University, and Lionge. Weiss, University of Virginia. 


11:00 a.m. Probability and Statistics in Genetics. With Biometric Society (ENAR) 
Chairman: Oscar KEMPTHORNE, Iowa State College 
Papers: 1. Some Problems of Stochastic Processes in Genetics, M. Kimura, University 
of Wisconsin. 
2. Estimation of Parameters in Genetic Models, Howarp Levene, Columbia 
University. 
3. Sequential Tests for Detection of Linkage in Man, Newton E. Morton, 
University of Wisconsin. 


2:00 p.m. Contributions of M. A. Girshick to Mathematical Statistics. With 
American Statistical Association 
Chairman: Henry Scuerrf, University of California 


Papers: 1. Multivariate Analysis, Harotp Hore uinG, University of North Carolina. 
2. Sequential Analysis, Evnwarp PAauLson, Queens College, New York City. 
3. Decision Theory, Davin BLACKWELL, University of California. 


4:00 p.m. Extreme Value Theory 


Chairman: SeBastian B. Litravger, Columbia University 


Papers: 1. Statistical Theory of Fatigue Failure and Breaking Strength, E. J. GuMBEL, 
Columbia University. 
2. On the Problem of Forecasting Extreme Values from a Curve Fitted to the 
Type I Extreme Value Distribution, BRaprorp F. Kimpati, New York 
State Public Service Commission. 
3. Developments in the Application of Extreme Value Theory, Jutius LiEBLEIN, 
National Bureau of Standards. 
Discussion: BENJAMIN Epstein, Wayne University and Stanford University, C. H. § 
Tuom, Advisory Committee on Weather Control 





NEWS AND NOTICES 235 


4:00 p.m. Components of Variance. With the American Statistical Association 
Chairman: C. W. Dunnett, American Cyanamid Company 


Papers: 1. Components of Variance, Finite Populations, and Statistical Inference, H. F. 
SmitH, North Carolina State College. 
2. Non-additivity in a Latin Square Design, M. B. W1tka nd O. KempTHorne, 
lowa State College. 
Discussion: Joun Tuxey, Princeton University 


6:00 p.m. 1955 Council Meeting 


Chairman: Henry Scuerré, President 
WEDNESDAY, DECEMBER 28, 1955 
9:00a.m. Contributed Papers II 


Chairman: M. Vernon Jouns, Columbia University 


Papers: 1. Remarks on Characteristic Functions, EuGENE Luxacs, Catholic University 
of America and Office of Naval Research. 

2. The Limiting Distribution of Serial Correlation Coefficient in the Explosive 
Case, Joun 8S. Wuite, University of Manitoba. 

3. The Distribution of the Ratio of Two Measures of Normal Dispersion, H. O. 
Hart ey, lowa State College. 

. Estimating a Linear Functional Relation, H. Farrrietp Situ, North 
Carolina State College. 

5. Asymptotic Distribution of Roots of Certain Determinantal Equations, R. 
GNANADESIKAN, University of North Carolina, (By title). 

}. Investigation of the Possibility of Using Likelihood Ratio Tests of Certain 
Multivariate Hypotheses for Obtaining Confidence Bounds, R. GNANADESI- 
KAN, University of North Carolina, (By title). 

7. Asymptotic Efficiencies of a Nonparametric Life Test for Investigating Smaller 
Percentiles of a Gamma Distribution, Joan E. Wausn, Lockheed Air- 
craft Corporation, (By title). 

. A Test of Judge Concordance for Paired Comparison Designs (Preliminary 
Report), J. W. Witx1nson, University of North Carolina, (By title). 

9. On the Efficiency of Certain Classes of Tests Based on the U-Statistics, Joan 
Raup RosENBLATT, National Bureau of Standards. 

10. The Dynamic Statistical Decision Problem when the Component Problem 
Involves a Finite Number, m, of Distributions, James F. HANNAN, Michigan 
State University. 

. On Certain Systems of Experiments as Interdependent Stochastic Processes 
(Preliminary Report), Davip Rosensiatt, American University, (By 
title). 

12. A Spherically-Symmetric Order Statistic r, (Preliminary Report), Brian 
Guvuss and Frep L. Stroptseckr, University of Chicago. 


11:00 a.m. Statistical Studies of Accident Proneness and Contagion of Accidents. 
With the Biometric Society (ENAR) 


Chairman: M. Vernon Jouns, Columbia University 


Papers: 1. General Review of Recent Results on Accident Proneness and Contagion, J. 
NeEYMAN, University of California. 





236 NEWS AND NOTICES 


2. Asymptotic Tests and Power of Tests of Certain Hypotheses Regarding Con- 
tagion in Accidents, CuaRLEs H. Krart, University of California. 
3. A Limit Theorem on Conditional Distributions Related to Studies of Accident 


Proneness, G. P. Steck, Sandia Base. 


2:00 p.m. Rietz Lecture 
Chairman: J. NeymMaNn, University of California 


Paper: Probability in Statistics, W1LLIAM FELLER, Princeton University. 


4:00 p.m. Acceptance Sampling Plans 
Chairman: GeraLp J. LIEBERMAN, Stanford University 


Papers: 1. Some Continuous Sampling Plans for Complex Items, GEorGE ReEsNIkoFr, 
Stanford University. 
2. An Economic Approach to the Choice of Continuous Sampling Plans, GEOFFREY 
GreeGory, Stanford University. 
3. On the Construction of Optimum Double Sampling Plans, ALLAN BiRNBAUM, 
Columbia University. 


6:00 p.m. Business Meeting 


Chairman: Henry Scuerré, President 


8:30 p.m. Council Meeting 


Chairman: Davip BLACKWELL, President 
THURSDAY, DECEMBER 29, 1955 
9:00 a.m. A Training Program Leading to Contributions in Experimental Design 


Chairman: CHURCHILL EIsSENHART, National Bureau of Standards 


Papers: 1. Who Makes Designs, W. J. YoupEN, National Bureau of Standards. 
2. The Construction of Fractional Factoral Designs for the 2" Series, F. L. M1uuEr, 
Jr., Purdue University and National Bureau of Standards. 
3. The Use of the 2" Fractional Factorial Designs for Factors at 4 Levels, H. Pertt- 
GREW, George Washington University and National Bureau of Standards. 
4. Some Combinatorial Relationships Arising in the Dualization of Incomplete 
Block Designs, R. Burton, National Bureau of Standards. 
Discussion: GERTRUDE M. Cox, North Carolina State College. 


11:00 a.m. Statistics in Medical Experimentation. With the Biometric Society 
(ENAR) 


Chairman: Lincotn E. Moses, Stanford University 


Papers: 1. Elimination of Selection Bias in Medical Experimentation, Davip BLACKWELL 
and J. L. Hopags, Jr., University of California. 
2. Estimation of Bacterial Densities, THomas 8. Fercuson, University of 
California. 
3. On Comparing Survival Rates, AGNES BERGER, School of Public Health, 
Columbia University. 


2:00 p.m. Special Invited Paper 
Chairman: HerMaAN Cuernorr, Stanford University 


Paper: Stochastic Approximation, Cyrus DerMan, Columbia University. 





NEWS AND NOTICES 237 


4:00 p.m. Recognized Needs for Mathematical Tables among Statisticians 


Chairman: ALBERT Bowker, Stanford University and Columbia University 


Discussion Session 


4:00 p.m. New Developments in Experimental Social Science. With the Social 
Statistics Section of the American Statistical Association 


Chairman: Frep L. Srroprseck, University of Chicago 


Papers: 1. Monte Carlo Methods in an Experimental Test of an Interaction Model, Davip 
G. Hayes, The Ranp Corporation. 
2. Some Theoretical Problems of Lezxico-Statistics, Morris SwapEsu, Denver, 
Colorado. (Presented by Joseph H. Greenberg, Columbia University.) 
3. Mathematical Models for the Empirical Study of Decision-Making, Patrick 
Suppes, Stanford University. 
Discussion: HerBert SoLtomon, Columbia University 
LEONARD J. SAVAGE, University of Chicago 
Davip R. Cox, University of Cambridge. 
JoserH H. GREENBERG, Columbia University. 


9:30 p.m. Informal Party. With American Statistical Association 
FRIDAY, DECEMBER 30, 1955 
9:00 a.m. Contributed Papers III 


Chairman: 8. N. Roy, University of North Carolina 


Papers: 1. Generalized Normalization Polynomials, D. TeicHroew, University of 
California at Los Angeles. 

2. Tables for Computing Bivariate Normal Probabilities, Donatp B. OwEn, 
Sandia Corporation. 

3. Bounds and Approximations for Constants Used in Quality Control, J. T. 
Cuu, University of North Carolina and Case Institute of Technology. 

. Four Streams of Traffic Converging on a Cross-Road, Brian Guvuss, Uni- 
versity of Chicago, introduced by Frep L. Stroptseck, University of 
Chicago, (By title). 

5. Markov Processes Arising in Learning Models, Joun G. Kemeny, and 
J. L. Sneti, Dartmouth College. 

3. On a Decision Rule for Selecting a Group Containing the Population with 
the Largest Mean (Preliminary Report), R. C. Bose and S. 8S. Gupta, 
University of North Carolina, (By title). 

. Recurrent Values of Sums of Independent Random Variables (Preliminary 
Report), Louis J. Core and Henry Teicuer, Purdue University. 

. A Problem Involving the Distribution of Shadows (Preliminary Report), 
HerMAN CHERNOFF, Stanford University, and Joserpnu F. Da ty, 
National Bureau of the Census. 

. Note on Two-Stage Test Procedures, 8. G. GuuryE, Lucknow University, 
(By title). 

. Some Properties of Generalized Sequential Probability Ratio Tests, J. 
Kerrer, Cornell University, and Lione, Wetss, University of Virginia. 

. Sequential Decision Problems for a Class of Stochastic Processes. Testing 
Hypotheses (Preliminary Report), A. T. Baarucna-Retp, University of 
California. 





NEWS AND NOTICES 

. Note on a Markov Chain with Matrix States and Some Applications, A. T. 
Buarucua-Reip, University of California, (By title). 

. On the Comparison of Two Stochastic Epidemics, A. T. Buarucua-ReEip, 
University of California, (By title). 

. A Sequential Multiple Decision Procedure for Ranking Means of Normal 
Populations with a Common Unknown Variance (Preliminary Report), 
R. E. Becuuorer, Cornell University, and M. Soper, Bell Telephone 
Laboratories, (By title). 

A Scale-Invariant Sequential Multiple Decision Procedure for Ranking 
Variances of Normal Populations (Preliminary Report), R. E. Brcu- 
HOFER, Cornell University, and M. Sopet, Bell Telephone Laboratories, 
(By title). 

16. Exact Probabilities in a Test for Markoff Dependency, Reep B. Dawson, 
Jr., Dept. of Defense. 

17. A Problem in Combinatory Analysis and its Applications to Probability 
Theory, T. B. Narayana, Institut H. Poincare and McGill University, 
introduced by Harotp Hore.uina. 

18. The Bayesian Inference Problem in Stochastic Systems, Max A. Woopsury, 
George Washington University. 


10:30 a.m. Methodology of Studying Motivation. With American Statistical 
Association 


Chairman: Freperick Mostevier, Harvard University 


Papers: 1. Some Statistical Aspects of the Q Technique, R. R. Banapvur and D. L. 
Wa .uace, University of Chicago. 
2. Industrial Mobility of Labor as a Probability Process, IsaporE BLUMEN, 
Marvin Kogan, and Pattie McCartny, Cornell University. 
Discussion: WILLIAM STEPHENSON, Greenwich, Connecticut, ANDREW BaGGaLey, Uni- 
versity of Wisconsin, LEo Goopman, University of Chicago 


11:00 a.m. Unpublished Mathematical Tables of Interest to Statisticians 


Chairman: Dan TEIcHroEw, National Cash Register Company, Dayton, Ohio 


Discussion Session 


2:00 p.m. Contributed Papers IV 
Chairman: EMANUEL Parzen, Columbia University 


Papers: 1. Some Nonparametric Generalizations of Multivariate Analysis and Analysis 

of Variance, 8. N. Roy, University of North Carolina. 

2. ‘‘No Interaction’”’ in a Three-Way Table, Marvin A. Kastensaum, Uni- 
versity of North Carolina. 

3. On Bartlett’s Test of Complex Contingency Table Interaction, Sus1itr KuMaR 
Mirra, University of North Carolina, (By title). 

4. A Theorem in Minimum Chi Square, Susi1t Kumar Mirra, University of 
North Carolina, (By title). 

5. Sequential Estimation from a Finite Population, Herspert Davin and 
INGRAM OLKIN, University of Chicago. 

6. Further Remarks on Measures of Association for Cross-Ciassifications, 
Leo A. GoopMan, University of Chicago, and Wiiu1am H. Krusxkat, 
Universities of California and Chicago. 





NEWS AND NOTICES 239 


. Uniformly Consistent Sequences of Multiple-Decision Rules, Wituiam 
Hau, University of North Carolina, (By title). 

8. Some Hypergeometric Series Distributions Occurring in Birth-and-Death 
Processes at Equilibrium (Preliminary Report), Witutam Hau, Uni 
versity of North Carolina, (By title). 

. Some General Aspects of Stochastic Approximations, Tosto KiTaGcawa, 
Iowa State College. 

. The Analysis of Incidence Rates under Multiple Classifications of the Popu- 
lation (Preliminary Report), Wyman Ricuarpson, University of North 
Carolina. 

. Estimation of Percentiles by Order Statistics, A. E. SAanHAN, University of 
North Carolina, (By title). 

2. On Renewal Theory, Counter Problems, and Quasi-Poisson Processes, 
Wa ter L. Situ, University of North Carolina, (By title). 

3. On the Construction of Significance Tests on the Circle and the Sphere, 
G.S. Watson, The Australian National University, Canberra, A.C.T., 
Australia, (By title). 

. Estimation of Individual Variations in an Unreplicated Two-Way Classifica- 
tion (Preliminary Report), Tuomas S. Russet and Rautpn A. BRaDLeEy, 
Virginia Polytechnic Institute, (By title). 

. Empirical Bayes Estimation (Preliminary Report), M. V. Jouns, JR. 
Columbia University. 


A. BrrNBAUM 
Associate Secretary 


MINUTES OF THE ANNUAL BUSINESS MEETING, 1955 


A business meeting was called to order at 6:15 p.m., December 28, 1955 in the 
Music Room of the Biltmore Hotel, New York City by President Henry Scheffé. 
Approximately 96 members were present. John Tukey reported on the activities 
of the Institute Committee on Mathematical Tables. An ad hoc committee to 
acquaint statisticians with the use of high speed calculators for statistical analysis 
was announced. R. L. Anderson is chairman of this committee. The Treasurer’s 
report was presented and approved. The Secretary’s report was presented and 
approved. The tellers were instructed to accept ballots from members who had 
not returned them by mail. The Editor’s report was presented and accepted 
The President presented his report and turned over the chair to the new presi- 
dent, David Blackwell. President Blackwell thanked the outgoing president for 
his work for the Institute during the past year. A discussion of the advisability 
of holding Institute of Mathematical Statistics meetings in hotels was con- 
ducted as new business. The President was instructed to appoint a committee or 
to take such other action as he deems suitable to improve physical facilities if 
meetings are held in hotels. 

The tellers announced the election of the following: 





NEWS AND NOTICES 


President Elect 
Alexander M. Mood 


Members of IMS Council for term 1955-1958 
R. C. Bose Oscar Kempthorne 
Churchill Eisenhart W. J. Youden 
The meeting was adjourned at 7:15 p.m. 
G. E. NicHo.son, JR., 
Secretary 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1955 


The Institute of Mathematical Statistics has by tradition made the last task 
of the retiring President a review of the year’s activity by our society. In 1955 
we held in the United States two national and two regional meetings. Besides 
the present national meeting in New York another was held at Ann Arbor in 
August and September. An Eastern regional meeting was held at Chapel Hill in 
April, and a Western regional meeting at Berkeley in July during the time of the 
Third Berkeley Symposium on Mathematical Statistics and Probability. Because 
of the Ann Arbor meeting no Central regional meeting was held, but the Central 
Regional Program Committee worked on plans for a meeting in Chicago in 
April, 1956. The Program Committees for these meetings are listed with other 
committees in the appendix to this report; their chairmen, who bore so much of 
the responsibility, were Herbert Solomon for this meeting, Carl F. Kossack for 
the Ann Arbor meeting, G. E. Noether for the Chapel Hill meeting, and Herman 
Rubin for the Berkeley meeting; Theodore A. Bancroft was chairman of the 
Central Regional Program Committee. Our meetings and the Annals of Mathe- 
matical Statistics are enriched by Special Invited Papers; the chairman of the 
Committee that selects authors for these was Herman Chernoff. A new service 
to our profession may be offered at future meetings in the form of an employment 
register; the question has been studied by a committee under the chairmanship 
of Gerald J. Lieberman. 

A new policy on meetings was adopted by the Council in September. A Com- 
mittee to Explore the Desirability of Changing the Time of Winter meetings had 
been set up last year under the chairmanship of C. C. Craig in response to 
dissatisfaction expressed by some of our members. On the basis of a sample 
survey of our North American members the committee judged there was very 
strong support among our members in all parts of North America for all four of 
the following propositions: (1) There should be only one national meeting a year. 
(2) This should alternate between the meeting places of national meetings of the 
American Mathematical Society and the American Statistical Association. (3) 
There should be a strengthened program of regional meetings. (4) The preferred 





NEWS AND NOTICES 241 


time of meeting is in the first half of September. The Council adopted a tentative 
policy for the next three years embracing these four propositions. 

Since abstracts heretofore have been published in connection with meetings, 
the new policy implying fewer and more irregularly timed meetings raised ques- 
tions about separating the publication of abstracts from meetings. This problem 
has been considered by a committee under the chairmanship of Howard Levene. 
Their recommendations have been adopted by the Council and include publica- 
tion of abstracts in every issue of the Annals, soon after their receipt. 

This year our society organized its first Summer Statistical Institute. It was 
on the topic “Statistical inference in stochastic processes,” was also held in 
Berkeley during the time of the Third Symposium, and was financed by the 
National Science Foundation. The idea of the Summer Institutes is to bring 
together a number of researchers on some specialty for an extended time, to 
interact with each other to the advancement of the specialty. The benefits hoped 
for are long range, there being no publications by the Summer Institute as such. 
David Blackwell was chairman of the Organizing Committee for the 1955 Summer 
Institute. We hope to run another in 1957. 

Discussion of the 1955 Volume of the Annals of Mathematical Statistics and 
its outlook for 1956 I have left to the report by our Editor, T. E. Harris. 

Another publishing activity of which we can be very proud is the preparation 
of the Wald Memorial Volume. This collection of papers by Abraham Wald was 
published under our sponsorship by the McGraw-Hill Book Company. Chairman 
of the Editorial Committee for the book was T. W. Anderson. 

A further publishing activity has been under consideration by the Committee 
on Activities and Development under the chairmanship of T. W. Anderson: The 
University of Chicago is contemplating publication of a series of monographs on 
mathematical statistics, and our committee has studied possible forms of coop- 
eration by the Institute. 

You have heard about our growth in membership during the year and about 
our financial state in the reports by our Secretary, George E. Nicholson, Jr., and 
our Treasurer, Albert H. Bowker. Our Individual Memberships Committee under 
the chairmanship of Eugene W. Pike has been successfully active, campaigning 
this year especially for graduate students. We gathered no new Institutional 
Memberships; I suggest every member of the Institute ask himself whether or 
not he is in a good position to promote one of these, and then follow his conscience. 

The former Advisory Committee on Statistical Computations was rejuvenated 
this year and renamed the IMS Committee on Mathematical Tables. This large 
committee has been very active under the chairmanship of John W. Tukey in 
exploring plans for new kinds of service to statisticians. The problems concern 
the kinds of tables needed, who might compute them, and forms of publication; 
you will continue to hear about them through this committee. An offshoot will 
be a new committee on fast machines. 

Of especial interest to our members in government service and to those training 
statisticians is the work of the Committee on Professional Standards of Statis- 





242 NEWS AND NOTICES 


ticians in Government Service under the chairmanship of Bradford K. Kimball. 
The committee is drafting a proposed letter to state personnel boards from the 
committee, suggesting schedules of duties that can be performed by statisticians 
at various levels, and corresponding schedules of qualifications. 

A Committee to Study the Possibilities of Closer Cooperation with Other 
Societies under the chairmanship of Samuel S. Wilks has made various specific 
suggestions, including some about more joint meetings, which will come before 
the new Council. 

The two members who did the most for our society in the last three years, both 
in volume and importance of their work, retired from office last summer. They 
are Kenneth J. Arnold, our ex-Secretary-Treasurer, and Erich L. Lehmann, our 
ex-Editor. Their achievements were memorialized by resolutions passed at the 
Ann Arbor meeting, which are printed in this issue of the Annals. I might add 
that Arnold is in effect being replaced by three people, our Secretary, George E. 
Nicholson, Jr., our Treasurer, Albert H. Bowker, and our Program Coordinator, 
Leo Katz. Beginning this year the Council separated the position of Program 
Coordinator from that of Program Chairman of the Annual Meeting, with the 
intention of somewhat lightening the Secretary’s burden: The Program Coordi- 
nator acts as a kind of secretary for meetings in general. The Council also de- 
cided to make the Program Coordinator an Associate Secretary, thus including 
him in the Council. Our new Editor, T. E. Harris, was the happy choice of a 
committee under the chairmanship of Samuel 8. Wilks, which canvassed the 
possibilities, personal and institutional, for the Editorship. We are grateful to 
Michigan State University, the University of California, the University of North 
Carolina, Stanford University, and The RAND Corporation for encouraging 
Arnold, Lehmann, Nicholson, Bowker, and Harris to accept these offices, for 
providing space, and for other forms of assistance. 

The Committee on Fellows, under the chairmanship of William G. Madow, 
in addition to its usual work prepared a report to the Council on standards for 
election to fellowship. I am pleased to announce that on nomination by this com- 
mittee the Council has elected the following members to be Fellows: 


K. J. Arnold 

G. E. P. Box 
Gustav Elfving 

M. H. Quenouille 
Murray Rosenblatt 
Herbert Solomon 


As the next Nominating Committee I have appointed the following members, 
who have accepted: 
Frederick Mosteller, Chairman 
Jerome Cornfield 
Harald Cramér 
David G. Kendall 
Brockway McMillan 





NEWS AND NOTICES 243 


I take this opportunity to express my thanks, personally and in behalf of our 
society, to all those who have faithfully served the Institute: the other officers, 
the Council members and committee members, our representatives, and the 
referees for the Annals. The referees are listed in the report of the Editor, the 
others in the following appendix. 


Henry ScCHEFFE 


President 
December 28, 1955 


Appendix. Committees of the Institute, 1955 


I. The Council and Committees of the Council 
(a) Elected members of the Council 
Term expires 1955 Term expires 1956 
W. G. Cochran T. W. Anderson, Jr. 
Churchill Eisenhart Joseph Berkson 
Henry Scheffé Z. W. Birnbaum 
J. W. Tukey David Blackwell 
W. G. Madow 
Term expires 1957 
R. L. Anderson 
Leo Goodman 
P. G. Hoel 
L. J. Savage 
Herbert Solomon 
(b) Executive Committee 
President: Henry Scheffé 
President-Elect: David Blackwell 
Secretary: K. J. Arnold (term expired June 30, 1955) 
George E. Nicholson, Jr., (term began July 1, 1955) 
Treasurer: K. J. Arnold (term expired March 31, 1955) 
Albert H. Bowker (term began April 1, 1955) 
Editor: E. L. Lehmann (terminated July 31, 1955) 
T. E. Harris (term began August 1, 1955) 
Committee on Fellows 
Term expires 1955 Term expires 1956 
W. G. Madow, Chairman David Blackwell 
Edward Paulson Howard Levene 
Term expires 1957 
M. S. Bartlett 
L. J. Savage 
Associate Secretaries 
Evelyn Fix 
W. H. Kruskal 
Lionel Weiss 
Associate Treasurer 
E. 8. Pearson 
Associate Editors 
Z. W. Birnbaum (term began August 1, 1955) 
David Blackwell (term expired July 31, 1955) 
Herman Chernoff (term began August 1, 1955) 
H. E. Daniels (term expired January 1, 1956) 
W. J. Dixon (term began August 1, 1955) 





NEWS AND NOTICES 


J. L. Hodges, Jr. (term expired July 31, 1955) 
J.M. Hammersley (term began January 1, 1956) 
Wassily Hoeffding (term expired July 31, 1955) 
W. G. Madow (term expired July 31, 1955) 
L. J. Savage (term began August 1, 1955) 
J. Wolfowitz 
II. Editorial Committee 

(a) T. E. Harris, Editor 

(b) Associate Editors, listed immediately above 

(c) Cooperating Members 
Z. W. Birnbaum D. A. Darling G. E. Noether 
R. C. Bose J. L. Doob M. Peisakoff 
G. E. P. Box T. E. Harris H. E. Robbins 
Herman Chernoff Paul G. Hoel L. J. Savage 
Kai Lai Chung J. Kiefer Charles M. Stein 
D. R. Cox William H. Kruskal Lionel Weiss 
J. F. Daly Solomon Kullback Max A.Woodbury 


Appointed Committees 
III. Program Committees 

(a) December Meeting—New York City 
Herbert Solomon, Chairman Donald A. 8S. Fraser 
Milton V. Johns, Jr., Asst. Sec. Oscar Kempthorne 
Kenneth J. Arrow Melvin P. Peisakoff 

S. N. Roy 

(b) September Meeting—Ann Arbor 
Carl F. Kossack, Chairman Horace W. Norton 
W. H. Kruskal, Assoc. Sec. Paul R. Rider 
P. 8. Dwyer, Asst. Sec. Murray Rosenblatt 
Henry B. Mann 

(c) Eastern Region 
G. E. Noether, Chairman Ralph A. Bradley 
Lionel Weiss, Assoc. Sec. Glenn L. Burrows 
Wassily Hoeffding, Asst. Sec. J. Edward Jackson 
Robert E. Bechhofer 

(d) Central Region 
Theodore A. Bancroft, Chairman Donald A. Darling 
Allen T. Craig Henry Teicher 

(e) Western Region 
Herman Rubin, Chairman W. J. Dixon 
Evelyn Fix, Assoc. Sec. Theodore E. Harris 
Charles H. Kraft, Asst. Sec. Stanley W. Nash 
Z. W. Birnbaum 

(f) Program Coordinator: Leo Katz, (ex officio member of all Program Committees) 

(g) Special Invited Paper Committee 
Herman Chernoff, Chairman Frederick Mosteller 
Theodore E. Harris Max A. Woodbury 
Wassily Hoeffding E. L. Lehmann (ex officio) 

G. E. Nicholson, Jr. (ex officio) 
. Promotional Committees 

(a) Individual Memberships 
Eugene W. Pike, Chairman James L. Dolby 
W. D. Baten Harry M. Hughes 
Carl A. Bennett 








NEWS AND NOTICES 245 


(b) Academic Institutional Memberships 


Boyd Harshbarger, Chairman Herbert Robbins 

Gerald J. Lieberman Murray Rosenblatt 
(c) Non-Academic Institutional Memberships 

Brockway McMillan, Chairman F. W. Dresch 

Cuthbert Daniel Alexander M. Mood 


V. Other Committees 
(a) Nominating Committee 
(Appointed by 1954 President E. G. Olds) 


W. J. Dixon, Chairman H. B. Mann 

M. A. Girshick H. Nisselson 

M. G. Kendall A. W. Tucker 

(b) Committee on Mathematical Tables 

John W. Tukey, Chairman W. J. Dixon 
Albert H. Bowker, Vice-Chairman J. L. Hodges, Jr. 
D. Teichroew, Secretary William Kruskal 
R. L. Anderson H. O. Hartley 
Robert E. Bechhofer J. W. Hopkins 
C. I. Bliss Jack Moshman 


Max A. Woodbury 


(c) Committee on Activities and Development 
T. W. Anderson, Chairman William Kruskal 
Joseph Berkson Samuel 8. Wilks 


Albert H. Bowker 

(d) Committee to Nominate a New Editor 
Samuel 8. Wilks, Chairman T. W. Anderson 
Mina Rees 


(e) Committee to Explore the Desirability of Changing Time of Winter Meetings 
C. C. Craig, Chairman Carl F. Kossack 
R. L. Anderson R. B. Murphy 


J. L. Hodges, Jr. 
(f) Committee on Professional Standards of Statisticians in Government Service 


B. F. Kimball, Chairman A. 8. Householder 
Robert W. Burgess Joseph Lev 
Besse B. Day Herbert Marshall 
Churchill Eisenhart Robert E. Patton 
G. M. Harrington John E. Walsh 
(g) Committee to Study the Possibilities of Closer Cooperation with Other Societies 
Samuel 8. Wilks, Chairman Edwin G. Olds 
William G. Cochran ” 
(h) Committee on Committee Procedures 
Horace W. Norton, Chairman Morris H. Hansen 
K. J. Arnold G. E. Nicholson, Jr. (ex officio) 
Leo A. Goodman Frederick F. Stephan 
(i) Committee to Consider the Feasibility of Keeping an Employment Register at 
IMS Meetings 
Gerald J. Lieberman, Chairman I. R. Savage 
David Blackwell 
(j) Committee to Study the Possibilities of Arranging a U. 8. Visit by Russian 
Probabilists 
David Blackwell, Chairman Jerzy Neyman 
J. L. Doob Herbert Robbins 


Eugene Lukacs 





NEWS AND NOTICES 


(k) Organizing Committee for Summer Statistical Institute in 1955 
David Blackwell, Chairman Herbert Robbins 
T. E. Harris 
(1) Planning Committee for a Summer Statistical Institute in 1956 
Herbert Robbins, Chairman Milton Sobel 
Jerzy Neyman John W. Tukey 
L. J. Savage 
(m) Committee to Consider Desirability of a Summer Statistical Institute in 1956 
Frederick Mosteller, Chairman John W. Tukey 
Cuthbert Daniel D. F. Votaw, Jr. 
Wassily Hoeffding William J. Youden 
L. J. Savage 
(n) Editorial Committee for the Wald Memorial Volume 
T. W. Anderson, Chairman E. L. Lehmann 
Harald Cramér Alexander M. Mood 
Harold A. Freeman Charles M. Stein 
J. L. Hodges, Jr. 
(0) Committee to Consider Policy on Publishing Abstracts 
Howard Levene, Chairman Wassily Hoeffding 
T. W. Anderson Jack C. Kiefer 
(p) Finance Committee 
Mortimer Spiegelman, Chairman A. H. Bowker (ex officio) 
K. J. Arnold (ex officio) John E. Walsh 
(q) Advisory Committee on Physical Facilities for Meetings 
Z. W. Birnbaum, Chairman George E. Nicholson, Jr. 
Leo Katz (Program Coordinator) 
(r) Committee on Exchanges 
Paul 8. Dwyer, Chairman G. E. Nicholson, Jr. (ex officio) 
K. J. Arnold (ex officic) Albert H. Bowker (ex officio) 
E. L. Lehmann (ex officio) T. E. Harris (ex officio) 
(s) Committee to Re-examine the Constitution and By-Laws 
William G. Cochran, Chairman Tjalling C. Koopmans 
T. W. Anderson Henry Scheffé (ex officio) 
Arnold Court Samuel 8S. Wilks 
(t) Committee to Consider the Format of the Annals 
Paul G. Hoel, Chairman : Alexander M. Mood 
George W. Brown 


Representatives of the Institute for 1955 


To the American Association for the Advancement of Science 
Harold Hotelling 
To the National Research Council, Division of Mathematics 
Samuel S. Wilks 
To the Policy Committee for Mathematics 
Joseph F. Daly 
To the Advisory Committee of American Standards Association concerning ISO/TC 69, 
Statistical Treatment of Series of Observations 
Howard Raiffa 
To the Inter-Society Committee on the Mathematical Training of Social Scientists 
W. G. Madow, T. W. Anderson 





NEWS AND NOTICES 


RESOLUTIONS OF THE INSTITUTE 


The following resolutions were voted at the Ann Arbor meeting of the Insti- 
tute of Mathematical Statistics, September 1, 1955. 


1) WHEREAS, Kenneth J. Arnold has ably and faithfully served the 
Institute of Mathematical Statistics in the dual capacity of Secretary- 
Treasurer from July 1952 through June 1955, and has now retired from this 
position; and 

WHEREAS, in performing the duties of this office Professor Arnold has 
made great personal sacrifices; and 

WHEREAS, besides handling the duties of Treasurer with diligence and 
foresight, and performing the regular Secretarial duties with competence, 
tact, and sympathetic consideration for the positions of others, Professor 
Arnold has made a signal contribution to the efficient operation of the Insti- 
tute by preparing a Codification of Actions of the Council and a Manual for the 
Guidance of Officers, Standing Committees, and Representatives of the Insti- 
tute: Therefore be it 

RESOLVED, that Members of the Institute of Mathematical Statistics 
at this Sixty-seventh Meeting record their gratitude and appreciation of the 
high-efficiency, faithfulness, good will, and vision with which Professor 
Arnold discharged the duties of Secretary-Treasurer; and be it 

RESOLVED further, that the President of the Institute of Mathematical 
Statistics be instructed to forward a copy of this Resolution to John A. 
Hannah, LL.D., President of the Michigan State University of Agriculture, 
and Applied Science. 


2) RESOLVED, that the membership of the Institute of Mathematical 
Statistics expresses its thanks to the retiring editor, Erich L. Lehmann, for 
his able discharge of the difficult and time-consuming duties of his office, and 
for the continuing development of the Annals of Mathematical Statistics under 
his leadership. 

RESOLVED further, that the President of the Institute of Mathematical 
Statistics be instructed to forward a copy of this Resolution to Dr. Clark 
Kerr, Chancellor of the University of California at Berkeley. 


rr 


REPORT OF THE SECRETARY OF THE INSTITUTE FOR 1955 


During 1955 the Institute held its sixty-fifth through sixty-eighth meetings. 
Business meetings were held during the sixty-seventh (seventeenth summer) 
meeting and the sixty-eighth (eighteenth annual) meeting. The Program Com- 
mittees are to be congratulated on the excellent programs which have been ar- 
ranged under the immediate direction of T. A. Bancroft, Carl F. Kossack, G. E. 
Noether, Herman Rubin and Herbert Solomon with the overall guidance of our 





248 NEWS AND NOTICES 


Program Coordinator, Leo Katz. The Assistant Secretaries, P.S. Dwyer, Wassily 
Hoeffding, Milton V. Johns, Jr., and Charles H. Kraft, are to be congratulated 
on the physical arrangements, and the Associate Secretaries, Allan Birnbaum, 
Evelyn Fix, and W. H. Kruskal, on their performance of the duties of the Secre- 
tary with respect to these meetings. 
In November a supplement to the directory of October 15, 1954, including 
changes of address and new members as of October 15, 1955, was issued. 
G. E. NicHo.son, JR., 


Secretary 
December 28, 1955 


0 


REPORT OF THE EDITOR OF THE ANNALS FOR 1955 


The 1955 volume of the Annals contained 76 papers, of which 16 were notes. 
This brought the total number of pages, counting miscellaneous material, to 785, 
a decrease of 41 pages below the previous year. The number of papers submitted 
in the year ending November 1, 1955, was the same as the number submitted 
in the preceding twelve-month period. The backlog of accepted manuscripts has 
grown and now amounts to about one and one-half issues, with some possibility 
of further growth. It therefore seems desirable, in spite of the rise of printing cost 
rates in 1955, to use about 900 pages in 1956. 

In accordance with a Council decision, the Annals will cease to be copyrighted 
in 1956. Beginning with the March or June issues, names of previous Editors 
will be listed on the front inside cover, and certain other changes in the inside 
cover format may be made. 

The Editor wishes to thank the previous Editor, E. L. Lehmann, for his gen- 
erous cooperation, and to acknowledge gratefully the work of David Blackwell, 
H. E. Daniels, J. L. Hodges, Jr., Wassily Hoeffding, W. G. Madow, and J. 
Wolfowitz who have continued to act as Associate Editors on manuscripts sub- 
mitted during the previous term. 

Many thanks are due to the Cooperating Members, old and new, and to the 
following people, (other than Cooperating Members) for very generous refereeing 
assistance, with apologies to any who are inadvertently omitted: R. L. Anderson, 
T. W. Anderson, R. R. Bahadur, G. Baxter, R. Bechhofer, A. Birnbaum, J. 
Blum, D. Chapman, W. Conner, D. R. Cox, G. B. Dantzig, M. Donsker, M. 
Dwass, C. Eisenhart, B. Epstein, T. Ferguson, Evelyn Fix, D. Fraser, I. J. Good, 
L. Goodman, H. Kahn, E. Kaplan, O. Kempthorne, D. G. Kendall, L. LeCam, 
J. Lieblein, E. Lukacs, H. B. Mann, A. Marshall, B. McMillan, E. Paulson, R. L. 
Plackett, E. S. Pearson, J. Putter, D. Ray, J. Riordan, Joan R. Rosenblatt, M. 
Rosenblatt, H. Rubin, H. Scheffé, M. Sobel, F. Spitzer, D. Truax, H. Tucker, 
J. Tukey, A. M. Walker, D. Wishart. 

The Editor is especially indebted to Patricia Rice for secretarial work, and to 





NEWS AND NOTICES 249 


Berniece Johnson, Dorothy Stewart, and Helena Williams for carrying out the 
editorial work. 
T. E. Harris 
Editor 


December 28, 1955 


Se 


PUBLICATIONS RECEIVED 


French Bibliographical Digest (Part I: Pure Mathematics), Series 2, No. 14, July, 1955, The 
Cultural Division of the French Embassy, New York, 128 pp. (Free of charge upon 
request). 

K.ue1n, L. R. anp GotpBerGer, A. S., An Econometric Model of the United States 1929-19652, 
North-Holland Publishing Co., Amsterdam, 1955, xv + 165 pp., $4.50. 

Mathematical Models of Human Behavior (Proceedings of a Symposium), 1955, Dunlap and 
Associates, Inc., Stamford, Connecticut, vii + 103 pp. 

MILuER, HERMAN P., Income of the American People, John Wiley and Sons, Inc., New 
York, 1955, 206 pp., $5.50. 








JOURNAL OF THE 
AMERICAN STATISTICAL ASSOCIATION 


March, 1956 
1108 16th St., N.W. Washington 6, D. C. VOL. 51 NO. 273 


Tabular Analysis of Factorial Experiments and the Use of Punch Cards 
J. R. Baryerinvge, Atison M. Grant, anp U. Rapox 


A Test of the Accuracy of a Production Index - «ses... CHARLES F, Carter anp Mary Rosson 


On Simplifications of Sampling Design Through Replication with Equal Probabilities and Without Stages 
Epwarps DemIne 


Investigating the Properties of a Sample Mean by Employing Random Subsample Means. .Howarp L. Jonzs 
Table of Percentage Points of Kolmogorov Statistics Lesuisz H. Miter 
Some Theoretical Aspects of the Lot Plot Sampling Inspection Plan.......... Tie Lincotn E. Moses 
Estimates of Bounded Relstive Error for the Ratio of Variances of Normal Distributions... Sranuey Rerrer 
Multiple Regression With Missing Observations Among the Independent Variables Georges L. Epeerr 


The Operating Characteristic Curve for Sequential Sampling by Variables When the Producer’s and Con- 
sumer’s Risks Are Equal : Norman R. GARNER 


Statistician and Policy Maker: A Partnership in the Making................ Werner Z. Hirscu 
BOOK REVIEWS AND STATISTICAL ABSTRACTS 


THE AMERICAN STATISTICAL ASSOCIATION INVITES 


AS MEMBERS ALL PERSONS INTERESTED IN: 
1. Development of new theory and method 
2. Improvement of basic statistical data 


3. Application of statistical methods to practical problems. 





ESTADISTICA 


Journal of the Inter American Statistical Institute 


Volume XIII, No. 48-49 Contents September-December 1955 


Next Tasks in the Measurement of Production and Productivity Irvine H. Siecz. 
Estadisticas Migratorias Internacionales: Situacién Actual, Aplicacién y Futuro Desarrollo de las Recomen- 
daciones Internacionales Oficina de Estadistica de las Naciones Unidas 
Procedimiento de Interpolacién para la Construccién de una Tabla Completa de Mortalidad 
Roqve Garcia-Frias 
Procedimientos de Recopilacién y Clases de Datos de las Estadisticas Industriales Continuas en Argentina 
Direccién Nacional del Servicio Estadistico 
Subsidios para o Estudo dos Fatéres Determinantes e Consequéncias das Variacdes Demogrdficas no Brasil 
Germano G. JARDIM 
Las Cuentas de las Corrientes Monetarias (Estados Unidos) Dante. H. Bri 
Progress in National Income Accounting of Honduras—Progreso en la Contabilidad del Ingreso Nacional de 
Honduras Banco Central de Honduras 
Vital Statistics of Panama—Estadisticas Vitales de Panamé Carmen A. Mrr6 
Importancia Metodolégica del Censo del Canad4 de 1951: Un Elogio del Informe Administrativo de la DBS 
Ricarpo Luna Vegas 


— de la Comisi6n ‘“‘B’’ sobre la Estadistica del Seguro Social, Seminario Interamericano de Seguridad 
Social, Panamé, 1954 


sestinete Affairs. Statistical News. Publications. 


Published quarterly Annual subscription price $3.00 (U. S.) 


INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D.C. 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 24, No. 1 - January, 1956 


L. Jurnten 7 ha iese Fam Trends in Food Consumption 
M. Borrevx Sur la gestion des Monopoles Publics astreints a |’équilibre budgétai 
Murray Kemp .....The Relation Between Changes in International Demand and the Terms of Trade 
Rosert L. BasmMann cabal ......A Theory of Demand with Consumer's Preferences Variable 
Tarroxu Kose. baits Solutions of Saddle Value Problems by Differential uations 
Tapas MasumMDAR Preference, Choice and the Theory of Games 
8.8.R.C. Commirrer Report 

Report of Inter-Society Committee on the Mathematical Training of Social Scientists 


Boox Reviews 


Elements of Pure Economics (Leon Walras). Review a Robert Solow 
Contributions of Survey Methods to Economws (L. R. Klein, Ed.) Review by M. J 
0 i Research for Management (Joseph . McCloskey and Florence N. Fe Eds.). Review by 
man 
Distributed Lags and Investment Analysis (L. M. Koyck). Review by Paul Boschan 
Price-Determining Factors in —_ Tobacco Markets (Hendrieke Goris). Review by Elmo L. Jackson 
Les Fondements Comptables de la Macro-Economique, Les Equations Comptablea entre Quantités Globales et 
leurs Applications (M. Allais). Review by Walter Froehlich 
Probleme der statistischen Methodenlehre (O. Anderson). Review by L. Schmetterer 
Biometrika Tables for Statisticians, Volume I (E. 8. Pearson and H. O. Hartley). Review by A. Hald 
— 4 Vanalyse macro-économique. Tome 1. Les origines (Jean-Claude Antoine). Review by Pierre 
aulet 
Die volkawirtechaflliche Gesamtrechnung (Werner Hofman). Review by Hans Peter 
Il Monopolio nella Teoria Economica (Siro Lombardini). Review by Pietro Castiglioni 


Published Quarterly Subscription rates available on request 
The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics 
Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 
for membership should be addressed to Richard Ruggles, Secretary, The Econometric Society, Box 
1264, Yale University, New Haven, Connecticut. 


BIOMETRIKA 


Volume 42 Contents Parts 3 and 4, December 1955 


CHAPMAN, D.G. Population estimation based on change of composition caused by a selective removal. 
WAUGH, W. A. O’N. An age-dependent birth and death process. ANDREWS, F. C. & CHERNOFF, 
H. A large-sample bioassay design with random doses and uncertain concentration. HANNAN, E. J. 
An exact test for correlation between time series. WATSON, G.S. Serial correlation in regression analysis. 
I. GANI, J. Some theorems and sufficiency conditions for the maximum-likelihood estimator of an un- 
known parameter in a simple markov chain. WILLIAMS, E. J. Significance tests for discriminant func- 
tions and linear functional relationships. YATES, F. A note on the application of the combination of 
probabilities test to a set of 2x 2 tables. YATES, F. The useof transformations and maximum likelihood 
in the analysis of quantal experiments involving two treatments. STUART, A. A test for homogeneity of 
the marginal distributions in a two-way classification. HABERMAN, 8. Distributions of Kendall’s tau 
based on partially ordered systems. SIMON, H. A. On a class of skew distribution functions. GHOSH, 
M. N. Simultaneous tests of linear hypotheses. HUITSON, A. A method of assigning confidence limits 
to linear combinations of variances. TUKEY, J. W. Interpolations and approximations related to the 
normal range. BRADLEY, R. A. Rank analysis of incomplete block designs. III. Some large-sample 
results on estimation and power for a method of paired comparisons. MOHAN, C. The gambler’s ruin 
problem with correlation. ARMSEN, P. Tables for significance tests of 2 x 2 contingency tables. 
Miscellanea—Contributions by Davin, H. A., Consn, A. C., Hatpane, J. B. 8., Cav. J. T., Lesiie, P. H., 
Paag, E. S., Srvart, A., James, G. 8., Goon, I. J., Huzurpazar, V. 8 


Reviews Other books received 


The subscription price, payable in advance, is 458 inland, 548. export per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary. Biometrika Office. Department of Statistics, 
University College. London, W.C. 1.” All foreign cheques must be in sterling and drawn on a bank 
having a London agency 





MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 


Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
80 Waterman Street, Providence 6, Rhode Island 





JOURNAL OF THE 


ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 
Vol. XVII, No. 1, 1955 


The Journal of the Royal Statistical Society is published in two series: Series A (General), four issues a year, 
15s. each part, annual subscription £3.1s. post free; Series B (Methodological), two issues a year, 22s.6d. each 
part, annual subscription 458.6d. post free. 


Permutation Theory in the Derivation of Robust Criteria and the Study of Departures from Assumption 
G. E. P. Box anp 8. L. ANpEeRsEeN (With Discussion) 


Some Problems in the Statistical Analysis of Epidemic Data. Norman T. J. Battery (With Discussion) 
Statistical Methods and Scientific Induction Str Ronavp Fisuer 


Pivotal Quantities for Wishart’s and Related Distributions, and a Paradox in Fiducial Theory. 
J. G. Mavutpon 


Confidence Intervals for the Parameter of a Distribution Admitting a Sufficient Statistic when the Range 
Depends on the Parameter ‘ V. 8. Huzurpazar 


Least Squares Regression Analysis for Trend- Reduced Time Series G. H. Jowerr 


Numerical Investigation of Least Squares Regression Involving Trend- mates Markoff Series. 
. F. Scorr anv V. J. Smaui 


Sampling Experiment on the Powers of the Records Tests for Trend in Time Series 
G. Foster anv D. Tricurorew 


Moments of Negative Order and Ratio. Statistics H. A. Davip 
A Rectifying Inspection Plan Ziv1a S. WurRTELE 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 








SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 15, Part 3, 1955 


Completeness, Similar Regions, and Unbiased Estimation—Part II E. L. Lenmann anv H. Scuerré 
Certain Modified Forms of Binomial and Poisson Distributions V. M. DanpeKarR 
A Note on the Structure of a Stochastic Model Considered by V. M. Dandekar D. Basu 
Analysis of Dispersion for Multiply Classified Data with Unequal Numbers in Cells C. R. Rao 


Approximate Probability Values for Observed Number of “‘Successes’’ from Statistically Independent Bi- 
nomial Events with Unequal Probabilities Joun E. Watsa 


Tables of Two-Sided 5% and 1% Control Limits for Individual Observations of the r-th Order 
MorosasuRro Masvuyama 
Modified Mean Square Successive Difference with an Exact Distribution A. R. Kamat 
Estimation and Tests of Significance of the Components of a Time-Series O. SURYANARAYANA 
A Note on a Form of Tchebycheff’s Inequality for Two or More Variables D. N. Lau 
Unbiased Test for a Specified Value of the Parameter in the Non-central F Distributions 
N. MARAKATHAVALLI 


ANNUAL Susscription: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Bacx Numpers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 


STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 





“oa eterna pean ME 
ae 71-28, one 








